Eligible Papers – Microarch

MICRO 2000

Paper Title	Authors
Eager Writeback – A Technique for Improving Bandwidth Utilization	Hsien-Hsin S. Lee, Gary S. Tyson, Matthew K. Farrens
Silent Stores for Free	Kevin M. Lepak, Mikko H. Lipasti
Predictor-Directed Stream Buffers	Timothy Sherwood, Suleyman Sair, Brad Calder
On Pipelining Dynamic Instruction Scheduling Logic	Jared Stark, Mary D. Brown, Yale N. Patt
The Impact of Delay on the Design of Branch Predictors	Daniel A. Jiménez, Stephen W. Keckler, Calvin Lin
Improving BTB Performance in the Presence of DLLs	Stevan A. Vlaovic, Edward S. Davidson, Gary S. Tyson
Efficient Checker Processor Design	Saugata Chatterjee, Christopher T. Weaver, Todd M. Austin
An Integrated Approach to Accelerate Data and Predicate Computations in Hyperblocks	Alexandre E. Eichenberger, Waleed Meleis, Suman Maradani
Accurate and Efficient Predicate Analysis with Binary Decision Diagrams	John W. Sias, Wen-mei W. Hwu, David I. August
Modulo Scheduling for a Fully-Distributed Clustered VLIW Architecture	F. Jesús Sánchez, Antonio González
Two-Level Hierarchical Register File Organization for VLIW Processors	Javier Zalamea, Josep Llosa, Eduard Ayguadé, Mateo Valero
PipeRench Implementation of the Instruction Path Coprocessor	Yuan C. Chou, Pazhani Pillai, Herman Schmit, John Paul Shen
Efficient Conditional Operations for Data-Parallel Architectures	Ujval J. Kapasi, William J. Dally, Scott Rixner, Peter R. Mattson, John D. Owens, Brucek Khailany
Flexible Hardware Acceleration for Multimedia Oriented Microprocessors	Frederik Vermeulen, Lode Nachtergaele, Francky Catthoor, Diederik Verkest, Hugo De Man
Very Low Power Pipelines Using Significance Compression	Ramon Canal, Antonio González, James E. Smith
A Static Power Model for Architects	J. Adam Butts, Gurindar S. Sohi
A Framework for Dynamic Energy Efficiency and Temperature Management	Michael C. Huang, Jose Renau, Seung-Moon Yoo, Josep Torrellas
Dynamic Zero Compression for Cache Energy Reduction	Luis Villa, Michael Zhang, Krste Asanovic
Register Integration: A Simple and Efficient Implementation of Squash Reuse	Amir Roth, Gurindar S. Sohi
The Store-Load Address Table and Speculative Register Promotion	Matt Postiff, David A. Greene, Trevor N. Mudge
Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor Architectures	Rajeev Balasubramonian, David H. Albonesi, Alper Buyuktosunoglu, Sandhya Dwarkadas
Frequent Value Compression in Data Caches	Jun Yang, Youtao Zhang, Rajiv Gupta
A Study of Slipstream Processors	Zachary Purser, Karthik Sundaramoorthy, Eric Rotenberg
Relational Profiling: Enabling Thread-Level Parallelism in Virtual Machines	Timothy H. Heil, James E. Smith
Calpa: A Tool for Automating Selective Dynamic Compilation	Markus Mock, Craig Chambers, Susan J. Eggers
Increasing the Size of Atomic Instruction Blocks Using Control Flow Assertions	Sanjay J. Patel, Tony Tung, Satarupa Bose, Matthew M. Crum
Reducing Wire Delay Penalty Through Value Prediction	Joan-Manuel Parcerisa, Antonio González
Compiler Controlled Value Prediction Using Branch Predictor Based Confidence	Eric Larson, Todd M. Austin
Instruction Distribution Heuristics for Quad-Cluster, Dynamically-Scheduled, Superscalar Processors	Amirali Baniasadi, Andreas Moshovos
Performance Improvement with Circuit-Level Speculation	Tong Liu, Shih-Lien Lu

MICRO 2001

Paper Title	Authors
Skipper: A Microarchitecture for Exploiting Control-Flow Independence	Chen-Yong Cher, T. N. Vijaykumar
Performance Characterization of a Hardware Mechanism for Dynamic Optimization	Brian Fahs, Satarupa Bose, Matthew Crum, Brian Slechta, Francesco Spadini, Tony Tung, Sanjay J. Patel, Steven S. Lumetta
Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems	Eric Rotenberg
A Design Space Evaluation of Grid Processor Architectures	Ramadass Nagarajan, Karthikeyan Sankaralingam, Doug Burger, Stephen W. Keckler
Reducing Set-Associative Cache Energy via Way-Prediction and Selective Direct-Mapping	Michael D. Powell, Amit Agarwal, T. N. Vijaykumar, Babak Falsafi, Kaushik Roy
A Code Decompression Architecture for VLIW Processors	Yuan Xie, Wayne Wolf, Haris Lekatsas
Direct Load: Dependence-Linked Dataflow Resolution of Load Address and Cache Coordinate	Byung-Kwon Chung, Jinsuo Zhang, Jih-Kwon Peir, Shih-Chang Lai, Konrad Lai
Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources	Dmitry Ponomarev, Gurhan Kucuk, Kanad Ghose
Exploiting VLIW Schedule Slacks for Dynamic and Leakage Energy Reduction	Wensheng Zhang, Vijaykrishnan Narayanan, Mahmut Kandemir, Mary Jane Irwin, David Duarte, Yuh-Fang Tsai
Reducing Power with Dynamic Critical Path Information	John S. Seng, Eric S. Tune, Dean M. Tullsen
Direct Addressed Caches for Reduced Power Consumption	Emmett Witchel, Sam Larsen, C. Scott Ananian, Krste Asanović
Modulo Schedule Buffers	Matthew C. Merten, Wen-mei W. Hwu
Graph-Partitioning Based Instruction Scheduling for Clustered Processors	Alex Aletà, Josep M. Codina, Jesús Sánchez, Antonio González
Modulo Scheduling with Integrated Register Spilling for Clustered VLIW Architectures	Javier Zalamea, Josep Llosa, Eduard Ayguadé, Mateo Valero
Efficient Static Single Assignment Form for Predication	Arthur Stoutchinin, Francois de Ferriere
The Impact of If-Conversion and Branch Prediction on Program Execution on the Intel® Itanium™ Processor	Youngsoo Choi, Allan Knies, Luke Gerke, Tin-Fook Ngai
Mapping Reference Code to Irregular DSPs Within the Retargetable, Optimizing Compiler COGEN(T)	Gary William Gréwal, Charles Thomas Wilson
Select-Free Instruction Scheduling Logic	Mary D. Brown, Jared Stark, Yale N. Patt
Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery	Joydeep Ray, James C. Hoe, Babak Falsafi
A High-Speed Dynamic Instruction Scheduling Scheme for Superscalar Processors	Masahiro Goshima, Kengo Nishino, Toshiaki Kitamura, Yasuhiko Nakashima, Shinji Tomita, Shin-ichiro Mori
Reducing the Complexity of the Register File in Dynamic Superscalar Processors	Rajeev Balasubramonian, Sandhya Dwarkadas, David H. Albonesi
Saving Energy with Architectural and Frequency Adaptations for Multimedia Applications	Christopher J. Hughes, Jayanth Srinivasan, Sarita V. Adve
Enhancing Loop Buffering of Media and Telecommunications Applications Using Low-Overhead Predication	John W. Sias, Hillery C. Hunter, Wen-mei W. Hwu
Cool-Cache for Hot Multimedia	Osman S. Unsal, Raksit Ashok, Israel Koren, C. Mani Krishna, Csaba Andras Moritz
ZR: A 3D API Transparent Technology for Chunk Rendering	Emile Hsieh, Vladimir Pentkovski, Thomas Piazza
Dynamic Speculative Precomputation	Jamison D. Collins, Dean M. Tullsen, Hong Wang, John P. Shen
Handling Long-Latency Loads in a Simultaneous Multithreading Processor	Dean M. Tullsen, Jeffery A. Brown
Correctly Implementing Value Prediction in Microprocessors That Support Multithreading or Multiprocessing	Milo M. K. Martin, Daniel J. Sorin, Harold W. Cain, Mark D. Hill, Mikko H. Lipasti

MICRO 2002

Paper Title	Authors
Vacuum Packing – Extracting Hardware-Detected Program Phases for Post-Link Optimization	Ronald D. Barnes, Erik M. Nystrom, Matthew C. Merten, Wen-mei W. Hwu
Power Protocol – Reducing Power Dissipation on Off-Chip Data Buses	K. Basu, Alok N. Choudhary, Jayaprakash Pisharath, Mahmut T. Kandemir
Hierarchical Scheduling Windows	Edward Brekelbaum, Jeff Rupley, Chris Wilkerson, Bryan Black
Characterizing and Predicting Value Degree of Use	J. Adam Butts, Gurindar S. Sohi
Microarchitectural Support for Precomputation Microthreads	Robert S. Chappell, Francis Tseng, Adi Yoaz, Yale N. Patt
Pointer Cache Assisted Prefetching	Jamison D. Collins, Suleyman Sair, Brad Calder, Dean M. Tullsen
Three-Dimensional Memory Vectorization for High Bandwidth Media Memory Systems	Jesús Corbal, Roger Espasa, Mateo Valero
DELI – A New Run-Time Control Point	Giuseppe Desoli, Nikolay Mateev, Evelyn Duesterwald, Paolo Faraboschi, Joseph A. Fisher
Managing Static Leakage Energy in Microprocessor Functional Units	Steve Dropsho, Volkan Kursun, David H. Albonesi, Sandhya Dwarkadas, Eby G. Friedman
A Faster Optimal Register Allocator	Changqing Fu, Kent D. Wilken
Effective Instruction Scheduling Techniques for an Interleaved Cache Clustered VLIW Processor	Enric Gibert, F. Jesús Sánchez, Antonio González
Microarchitectural Denial of Service – Insuring Microarchitectural Fairness	Dirk Grunwald, Soraya Ghiasi
Dynamic Addressing Memory Arrays with Physical Locality	Steven Hsu, Shih-Lien Lu, Shih-Chang Lai, Ram Krishnamurthy, Konrad Lai
Generating Physical Addresses Directly for Saving Instruction TLB Energy	Ismail Kadayif, Anand Sivasubramaniam, Mahmut T. Kandemir, Gokul B. Kandiraju, Guangyu Chen
Drowsy Instruction Caches – Leakage Power Reduction Using Dynamic Voltage Scaling and Cache Sub-Bank Prediction	Nam Sung Kim, Krisztián Flautner, David T. Blaauw, Trevor N. Mudge
Vector vs. Superscalar and VLIW Architectures for Embedded Multimedia Benchmarks	Christoforos E. Kozyrakis, David A. Patterson
Compiling for Instruction Cache Performance on a Multithreaded Architecture	Rakesh Kumar, Dean M. Tullsen
Convergent Scheduling	Walter Lee, Diego Puppin, Shane Swenson, Saman P. Amarasinghe
Reduced Code Size Modulo Scheduling in the Absence of Hardware Support	Josep Llosa, Stefan M. Freudenberger
Exploiting Data-Width Locality to Increase Superscalar Execution Handwidth	Gabriel H. Loh
Cherry – Checkpointed Early Resource Recycling in Out-of-Order Microprocessors	José F. Martínez, Jose Renau, Michael C. Huang, Milos Prvulovic, Josep Torrellas
Instruction Fetch Deferral Using Static Slack	Gregory A. Muthler, David Crowe, Sanjay J. Patel, Steven Lumetta
Reducing Register Ports for Higher Speed and Lower Energy	Il Park, Michael D. Powell, T. N. Vijaykumar
Three Extensions to Register Integration	Vlad Petric, Anne Bracy, Amir Roth
Fetching Instruction Streams	Alex Ramírez, Oliverio J. Santana, Josep-Lluís Larriba-Pey, Mateo Valero
A Quantitative Framework for Automated Pre-Execution Thread Selection	Amir Roth, Gurindar S. Sohi
Dynamic Frequency and Voltage Control for a Multiple Clock Domain Microarchitecture	Greg Semeraro, David H. Albonesi, Steve Dropsho, Grigorios Magklis, Sandhya Dwarkadas, Michael L. Scott
Register Write Specialization Register Read Specialization – A Path to Complexity-Effective Wide-Issue Superscalar Processors	André Seznec, Eric Toullec, Olivier Rochecouste
Optimizing Pipelines for Power and Performance	Viji Srinivasan, David M. Brooks, Michael Gschwind, Pradip Bose, Victor V. Zyuban, Philip N. Strenski, Philip G. Emma
Using Modern Graphics Architectures for General-Purpose Computing – A Framework and Analysis	Chris J. Thompson, Sahngyun Hahn, Mark Oskin
Microarchitectural Exploration with Liberty	Manish Vachharajani, Neil Vachharajani, David A. Penry, Jason A. Blome, David I. August
Orion – A Power-Performance Simulator for Interconnection Networks	Hangsheng Wang, Xinping Zhu, Li-Shiuan Peh, Sharad Malik
Compiler Managed Micro-Cache Bypassing for High Performance EPIC Processors	Youfeng Wu, Ryan N. Rakvic, Li-Ling Chen, Chyi-Chang Miao, George Chrysos, Jesse Fang
Energy Efficient Frequent Value Data Cache Design	Jun Yang, Rajiv Gupta
Compiler-Directed Instruction Cache Leakage Optimization	Wei Zhang, Jie S. Hu, Vijay Degalahal, Mahmut T. Kandemir, Narayanan Vijaykrishnan, Mary Jane Irwin
Master/Slave Speculative Parallelization	Craig B. Zilles, Gurindar S. Sohi

MICRO 2003

Paper Title	Authors
VSV: L2-Miss-Driven Variable Supply-Voltage Scaling for Low Power	Hai Li, Chen-Yong Cher, T. N. Vijaykumar, Kaushik Roy
A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor	Shubhendu S. Mukherjee, Christopher T. Weaver, Joel S. Emer, Steven K. Reinhardt, Todd M. Austin
TLC: Transmission Line Caches	Bradford M. Beckmann, David A. Wood
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures	Zeshan Chishti, Michael D. Powell, T. N. Vijaykumar
Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches	Se-Hyun Yang, Babak Falsafi
Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data	Canturk Isci, Margaret Martonosi
Power-Driven Design of Router Microarchitectures in On-Chip Networks	Hangsheng Wang, Li-Shiuan Peh, Sharad Malik
Optimum Power/Performance Pipeline Depth	Allan Hartstein, Thomas R. Puzak
Processor Acceleration Through Automated Instruction Set Customization	Nathan Clark, Hongtao Zhong, Scott A. Mahlke
The Reconfigurable Streaming Vector Processor (RSVPTM)	Silviu M. S. A. Chiricescu, Ray Essick, Brian Lucas, Phil May, Kent Moat, Jim Norris, Michael A. Schuette, Ali Saidi
Scaling and Characterizing Database Workloads: Bridging the Gap Between Research and Practice	Richard A. Hankins, Trung A. Diep, Murali Annavaram, Brian Hirano, Harald Eri, Hubert Nueckel, John Paul Shen
Generational Cache Management of Code Traces in Dynamic Optimization Systems	Kim M. Hazelwood, Michael D. Smith
The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System	Jiwei Lu, Howard Chen, Rao Fu, Wei-Chung Hsu, Bobbie Othmer, Pen-Chung Yew, Dong-yuan Chen
IA-32 Execution Layer: A Two-Phase Dynamic Translator Designed to Support IA-32 Applications on Itanium-Based Systems	Leonid Baraz, Tevi Devor, Orna Etzion, Shalom Goldenberg, Alex Skaletsky, Yun Wang, Yigel Zemach
LLVA: A Low-level Virtual Instruction Set Architecture	Vikram S. Adve, Chris Lattner, Michael Brukman, Anand Shukla, Brian Gaeke
Comparing Program Phase Detection Techniques	Ashutosh S. Dhodapkar, James E. Smith
Using Interaction Costs for Microarchitectural Bottleneck Analysis	Brian A. Fields, Rastislav Bodík, Mark D. Hill, Chris J. Newburn
Fast Path-Based Neural Branch Prediction	Daniel A. Jiménez
Hardware Support for Control Transfers in Code Caches	Ho-Seop Kim, James E. Smith
Exploiting Value Locality in Physical Register Files	Saisanthosh Balakrishnan, Gurindar S. Sohi
Macro-Op Scheduling: Relaxing Scheduling Loop Constraints	Ilhyun Kim, Mikko H. Lipasti
WaveScalar	Steven Swanson, Ken Michelson, Andrew Schwerin, Mark Oskin
Universal Mechanisms for Data-Parallel Architectures	Karthikeyan Sankaralingam, Stephen W. Keckler, William R. Mark, Doug Burger
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors	Enric Gibert, F. Jesús Sánchez, Antonio González
Instruction Replication for Clustered Microarchitectures	Alex Aletà, Josep M. Codina, Antonio González, David R. Kaeli
Efficient Memory Integrity Verification and Encryption for Secure Processors	G. Edward Suh, Dwaine E. Clarke, Blaise Gassend, Marten van Dijk, Srinivas Devadas
Fast Secure Processor for Inhibiting Software Piracy and Tampering	Jun Yang, Youtao Zhang, Lan Gao
IPStash: A Power-Efficient Memory Architecture for IP-Lookup	Stefanos Kaxiras, Georgios Keramidas
Design and Implementation of High-Performance Memory Systems for Future Packet Buffers	Jorge García-Vidal, Jesús Corbal, Llorenç Cerdà, Mateo Valero
Beating In-Order Stalls with “Flea-Flicker” Two-Pass Pipelining	Ronald D. Barnes, Erik M. Nystrom, John W. Sias, Sanjay J. Patel, Nacho Navarro, Wen-mei W. Hwu
Scalable Hardware Memory Disambiguation for High ILP Processors	Simha Sethumadhavan, Rajagopalan Desikan, Doug Burger, Charles R. Moore, Stephen W. Keckler
Reducing Design Complexity of the Load/Store Queue	Il Park, Chong-liang Ooi, T. N. Vijaykumar
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors	Haitham Akkary, Ravi Rajwar, Srikanth T. Srinivasan

MICRO 2004

Paper Title	Authors
The Fuzzy Correlation Between Code and Performance Predictability	Murali Annavaram, Ryan N. Rakvic, Marzia Polito, Jean-Yves Bouguet, Richard A. Hankins, Bob Davies
Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery	David N. Armstrong, Hyesoon Kim, Onur Mutlu, Yale N. Patt
Cache Refill/Access Decoupling for Vector Machines	Christopher Batten, Ronny Krashinsky, Steve Gerding, Krste Asanovic
Managing Wire Delay in Large Chip-Multiprocessor Caches	Bradford M. Beckmann, David A. Wood
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth	Anne Bracy, Prashant Prahlad, Amir Roth
Automatic Synthesis of High-Speed Processor Simulators	Martin Burtscher, Ilya Ganusov
Dynamically Controlled Resource Allocation in SMT Processors	Francisco J. Cazorla, Alex Ramírez, Mateo Valero, Enrique Fernández
Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization	Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott A. Mahlke, Krisztián Flautner
Control Flow Optimization Via Dynamic Reconvergence Prediction	Jamison D. Collins, Dean M. Tullsen, Hong Wang
Minos: Control Data Attack Prevention Orthogonal to Memory Model	Jedidiah R. Crandall, Frederic T. Chong
A Hardware-Software Platform for Intrusion Prevention	Milenko Drinic, Darko Kirovski
Dynamically Trading Frequency for Complexity in a GALS Microprocessor	Steven G. Dropsho, Greg Semeraro, David H. Albonesi, Grigorios Magklis, Michael L. Scott
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure	Oguz Ergin, Deniz Balkan, Kanad Ghose, Dmitry V. Ponomarev
Compiler Optimizations for Transaction Processing Workloads on Itanium Linux Systems	Gerolf Hoflehner, Knud Kirkegaard, Rod Skinner, Daniel M. Lavery, Yong-Fong Lee, Wei Li
Adaptive History-Based Memory Schedulers	Ibrahim Hur, Calvin Lin
Conjoined-Core Chip Multiprocessing	Rakesh Kumar, Norman P. Jouppi, Dean M. Tullsen
A Case for Clumsy Packet Processors	Arindam Mallik, Gokhan Memik
Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation	Harish Patil, Robert S. Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, Anand Karunanidhi
MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms	Daniel Gracia Pérez, Gilles Mouchard, Olivier Temam
Memory Controller Optimizations for Web Servers	Scott Rixner
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication	Peter G. Sassone, D. Scott Wills
Thermal Modeling, Characterization and Management of On-Chip Networks	Li Shang, Li-Shiuan Peh, Amit Kumar, Niraj K. Jha
Optimal Superblock Scheduling Using Enumeration	Ghassan Shobaki, Kent D. Wilken
Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures	Jared C. Smolens, Jangwoo Kim, James C. Hoe, Babak Falsafi
Hardware and Binary Modification Support for Code Pointer Protection From Buffer Overflow	Nathan Tuck, Brad Calder, George Varghese
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy	Eric Tune, Rakesh Kumar, Dean M. Tullsen, Brad Calder
RIFLE: An Architectural Framework for User-Centric Information-Flow Security	Neil Vachharajani, Matthew J. Bridges, Jonathan Chang, Ram Rangan, Guilherme Ottoni, Jason A. Blome, George A. Reis, Manish Vachharajani, David I. August
Whole Execution Traces	Xiangyu Zhang, Rajiv Gupta
AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants	Pin Zhou, Wei Liu, Long Fei, Shan Lu, Feng Qin, Yuanyuan Zhou, Samuel P. Midkiff, Josep Torrellas