MICRO logo

Annual IEEE/ACM International Symposium on Microarchitecture®

MICRO Test of Time Award

List of Eligible Papers for the 2018 Award

View the 2018 call for nominations.

MICRO 1996

Paper TitleAuthors
A Persistent Rescheduled-Page Cache for Low Overhead Object Code Compatibility in VLIW ArchitecturesThomas M. Conte, Sumedh W. Sathaye, Sanjeev Banerjia
Integrating a Misprediction Recovery Cache (MRC) Into a Superscalar PipelineJames O. Bondi, Ashwini K. Nanda, Simonjit Dutta
Accurate and Practical Profile-Driven Compilation Using the Profile BufferThomas M. Conte, Kishore N. Menezes, Mary Ann Hirsch
Efficient Path ProfilingThomas Ball, James R. Larus
Profile-Driven Instruction Level Parallel Scheduling with Application to Super BlocksC. Chekuri, R. Johnson, R. Motwani, B. Natarajan, B. R. Rau, M. Schlansker
Speculative Hedge: Regulating Compile-Time Speculation Against Profile VariationsBrian L. Deitrich, Wen-mei W. Hwu
Hot Cold Optimization of Large Windows/NT ApplicationsRobert Cohn, P. Geoffrey Lowney
Java Bytecode to Native Code Translation: The Caffeine Prototype and Preliminary ResultsCheng-Hsueh A. Hsieh, John C. Gyllenhaal, Wen-mei W. Hwu
Analysis Techniques for Predicated CodeRichard Johnson, Michael Schlansker
Global Predicate Analysis and Its Application to Register AllocationDavid M. Gillies, Dz-ching Roy Ju, Richard Johnson, Michael Schlansker
Modulo Scheduling of Loops in Control-Intensive Non-Numeric ProgramsDaniel M. Lavery, Wen-mei W. Hwu
Assigning Confidence to Conditional Branch PredictionsErik Jacobsen, Eric Rotenberg, J. E. Smith
Compiler Synthesized Dynamic Branch PredictionScott Mahlke, Balas Natarajan
Wrong-Path Instruction PrefetchingJim Pierce, Trevor Mudge
Design Decisions Influencing the UltraSPARC's Instruction Fetch ArchitectureRobert Yung
Increasing the Instruction Fetch Rate Via Block-Structured Instruction Set ArchitecturesEric Hao, Po-Yung Chang, Marius Evers, Yale N. Patt
Instruction Fetch Mechanisms for VLIW Architectures with Compressed EncodingsThomas M. Conte, Sanjeev Banerjia, Sergei Y. Larin, Kishore N. Menezes, Sumedh W. Sathaye
Tango: A Hardware-Based Data Prefetching Technique for Superscalar ProcessorsShlomit S. Pinter, Adi Yoaz
The Performance Potential of Data Dependence Speculation & CollapsingYiannakis Sazeides, Stamatis Vassiliadis, James E. Smith
Heuristics for Register-Constrained Software PipeliningJosep Llosa, Mateo Valero, Eduard Ayguadé
Software Pipelining Loops with Conditional BranchesMark G. Stoodley, Corinna G. Lee
Combining Loop Transformations Considering Caches and SchedulingMichael E. Wolf, Dror E. Maydan, Ding-Kai Chen
Instruction Scheduling and Executable EditingEric Schnarr, James R. Larus
Instruction Scheduling for the HP PA-8000David A. Dunn, Wei-Chung Hsu
Meld Scheduling: Relaxing Scheduling Constraints Across Region BoundariesSantosh G. Abraham, Vinod Kathail, Brian L. Deitrich
Custom-Fit Processors: Letting Applications Define ArchitecturesJoseph A. Fisher, Paolo Faraboschi, Giuseppe Desoli
Optimization for a Superscalar Out-of-Order MachineAnne M. Holler
Optimization of Machine Descriptions for Efficient UseJohn C. Gyllenhaal, Wen-mei W. Hwu, B. Ramabriohna Rau

MICRO 1997

Paper TitleAuthors
The Bi-Mode Branch PredictorChih-Chieh Lee, I-Cheng K. Chen, Trevor N. Mudge
Path-Based Next Trace PredictionQuinn Jacobson, Eric Rotenberg, James E. Smith
Alternative Fetch and Issue Policies for the Trace Cache Fetch MechanismDaniel Holmes Friendly, Sanjay Jeram Patel, Yale N. Patt
Reducing the Performance Impact of Instruction Cache Misses by Writing Instructions Into the Reservation Stations Out-of-OrderJared Stark, Paul Racunas, Yale N. Patt
On High-Bandwidth Data Cache Design for Multi-Issue ProcessorsJude A. Rivers, Gary S. Tyson, Edward S. Davidson, Todd M. Austin
Run-Time Spatial Locality Detection and OptimizationTeresa L. Johnson, Matthew C. Merten, Wen-Mei W. Hwu
A Comparison of Data Prefetching On an Access Decoupled and Superscalar MachineG. P. Jones, N. P. Topham
The Design and Performance of a Conflict-Avoiding CacheNigel Topham, Antonio González, José González
Prediction Caches for Superscalar ProcessorsJames E. Bennett, Michael J. Flynn
A Framework for Balancing Control Flow and PredicationDavid I. August, Wen-mei W. Hwu, Scott A. Mahlke
Evaluation of Scheduling Techniques On a SPARC-Based VLIW TestbedSeongbae Park, SangMin Shim, Soo-Mook Moon
Tuning Compiler Optimizations for Simultaneous MultithreadingJack L. Lo, Susan J. Eggers, Henry M. Levy, Sujay S. Parekh, Dean M. Tullsen
Exploiting Dead Value InformationMilo M. Martin, Amir Roth, Charles N. Fischer
Trace ProcessorsEric Rotenberg, Quinn Jacobson, Yiannakis Sazeides, Jim Smith
The Multicluster Architecture: Reducing Cycle Time Through PartitioningKeith I. Farkas, Paul Chow, Norman P. Jouppi, Zvonko Vranesic
Out-of-Order Vector ArchitecturesRoger Espasa, Mateo Valero, James E. Smith
Initial Results On the Performance and Cost of Vector MicroprocessorsCorinna G. Lee, Derek J. DeVries
The Filter Cache: An Energy Efficient Memory StructureJohnson Kin, Munish Gupta, William H. Mangione-Smith
Improving Code Density Using Compression TechniquesCharles Lefurgy, Peter Bird, I-Cheng Chen, Trevor Mudge
Procedure Based Program CompressionDarko Kirovski, Johnson Kin, William H. Mangione-Smith
Improving the Accuracy and Performance of Memory Communication Through RenamingGary S. Tyson, Todd M. Austin
Microarchitecture Support for Improving the Performance of Load Target PredictionChung-Ho Chen, Akida Wu
Streamlining Inter-Operation Memory Communication Via Data Dependence PredictionAndreas Moshovos, Gurindar S. Sohi
The Predictability of Data ValuesYiannakis Sazeides, James E. Smith
Value ProfilingBrad Calder, Peter Feller, Alan Eustace
Can Program Profiling Support Value Prediction?Freddy Gabbay, Avi Mendelson
Highly Accurate Data Value Prediction Using Hybrid PredictorsKai Wang, Manoj Franklin
ProfileMe: Hardware Support for Instruction-Level Profiling On Out-of-Order ProcessorsJeffrey Dean, James E. Hicks, Carl A. Waldspurger, William E. Weihl, George Chrysos
Procedure Placement Using Temporal Ordering InformationNikolas Gloy, Trevor Blackwell, Michael D. Smith, Brad Calder
Predicting Data Cache Misses in Non-Numeric Applications Through Correlation ProfilingTodd C. Mowry, Chi-Keung Luk
Available Parallelism in Video ApplicationsHeng Liao, Andrew Wolfe
MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communicatons SystemsChunho Lee, Miodrag Potkonjak, William H. Mangione-Smith
Cache Sensitive Modulo SchedulingF. Jesús Sánchez, Antonio González
Unroll-and-Jam Using Uniformly Generated SetsSteve Carr, Yiping Guan
Resource-Sensitive Profile-Directed Data Flow Analysis for Code OptimizationRajiv Gupta, David A. Berson, Jesse Z. Fang

MICRO 1998

Paper TitleAuthors
A Bandwidth-Efficient Architecture for Media ProcessingScott Rixner, William J. Dally, Ujval J. Kapasi, Brucek Khailany, Abelardo López-Lagunas, Peter R. Mattson, John D. Owens
Exploiting Instruction Level Parallelism in Geometry Processing for Three Dimensional Graphics ApplicationsChia-Lin Yang, Barton Sano, Alvin R. Lebeck
Simple Vector Microprocessors for Multimedia ApplicationsCorinna G. Lee, Mark G. Stoodley
Evaluating MMX Technology Using DSP and Multimedia ApplicationsRavi Bhargava, Lizy K. John, Brian L. Evans, Ramesh Radhakrishnan
Analyzing the Working Set Characteristics of Branch ExecutionSangwook P. Kim, Gary S. Tyson
Dataflow Analysis of Branch Mispredictions and Its Application to Early Resolution of Branch OutcomesAlexandre Farcy, Olivier Temam, Roger Espasa, Toni Juan
The YAGS Branch Prediction SchemeAvinoam N. Eden, Trevor Mudge
Task Selection for a Multiscalar ProcessorT. N. Vijaykumar, Gurindar S. Sohi
Split-Path Enhanced Pipeline Scheduling for Loops with Control FlowsSangMin Shim, Soo-Mook Moon
Effective Cluster Assignment for Modulo SchedulingErik Nystrom, Alexandre E. Eichenberger
Better Global Scheduling Using Path ProfilesCliff Young, Michael D. Smith
Predictive Techniques for Aggressive Load SpeculationGlenn Reinman, Brad Calder
Compiler-Directed Early Load-Address GenerationBen-Chung Cheng, Daniel A. Connors, Wen-mei W. Hwu
Load Latency Tolerance in Dynamically Scheduled ProcessorsSrikanth T. Srinivasan, Alvin R. Lebeck
Improving I/O Performance with a Conditional Store BufferLambert Schaelicke, Al Davis
Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache MicroprocessorsDaniel Holmes Friendly, Sanjay Jeram Patel, Yale N. Patt
Cooperative Prefetching: Compiler and Hardware Support for Effective Instruction Prefetching in Modern ProcessorsChi-Keung Luk, Todd C. Mowry
Code Compression Based on Operand FactorizationGuido Araujo, Paulo Centoducatte, Mario Cartes, Ricardo Pannain
Understanding the Differences Between Value Prediction and Instruction ReuseAvinash Sodani, Gurindar S. Sohi
A Novel Renaming Scheme to Exploit Value Temporal Locality Through Physical Register Reuse and UnificationStephen Jourdan, Ronny Ronen, Michael Bekerman, Bishara Shomar, Adi Yoaz
A Dynamic Multithreading ProcessorHaitham Akkary, Michael A. Driscoll
Widening Resources: A Cost-Effective Technique for Aggressive ILP ArchitecturesDavid López, Josep Llosa, Mateo Valero, Eduard Ayguadé
The Cascaded Predictor: Economical and Adaptive Branch Target PredictionKarel Driesen, Urs Hölzle
Improving Prediction for Procedure Returns with Return-Address-Stack Repair MechanismsKevin Skadron, Pritpal S. Ahuja, Margaret Martonosi, Douglas W. Clark
Predicting Indirect Branches via Data CompressionJohn Kalamatianos, David R. Kaeli
Improving Locality Using Loop and Data Transformations in an Integrated FrameworkMahmut Kandemir, Alok Choudhary, J. Ramanujam, Prithviraj Banerjee
Precise Register Allocation for Irregular ArchitecturesTimothy Kong, Kent D. Wilken
Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File MicroarchitecturesEmre Özer, Sanjeev Banerjia, Thomas M. Conte

MICRO 1999

Paper TitleAuthors
Control Independence in Trace ProcessorsEric Rotenberg, James E. Smith
Fetch Directed Instruction PrefetchingGlenn Reinman, Brad Calder, Todd M. Austin
Improving Branch Predictors by Correlating on Data ValuesTimothy H. Heil, Zak Smith, James E. Smith
Instruction Fetch Mechanisms for Multipath Execution ProcessorsArtur Klauser, Dirk Grunwald
A Superscalar 3D Graphics EngineAndrew Wolfe, Derek B. Noonburg
Dynamic 3D Graphics Workload Characterization and the Architectural ImplicationsTulika Mitra, Tzi-cker Chiueh
Exploiting a New Level of DLP in Multimedia ApplicationsJesus Corbal, Roger Espasa, Mateo Valero
Compiler-Driven Cached Code Compression Schemes for Embedded ILP ProcessorsSergei Y. Larin, Thomas M. Conte
Evaluation of a High Performance Code Compression MethodCharles Lefurgy, Eva Piccininni, Trevor N. Mudge
Low-Cost Branch Folding for Embedded Applications with Small Tight LoopsLea Hwang Lee, Jeff Scott, Bill Moyer, John Arends
Automatic and Efficient Evaluation of Memory Hierarchies for Embedded SystemsSantosh G. Abraham, Scott A. Mahlke
Hardware Identification of Cache Conflict MissesJamison D. Collins, Dean M. Tullsen
Access Region Locality for High-Bandwidth Processor Memory System DesignSangyeun Cho, Pen-Chung Yew, Gyungho Lee
Code Transformations to Improve Memory ParallelismVijay S. Pai, Sarita V. Adve
Compiler-Directed Dynamic Computation Reuse: Rationale and Initial ResultsDaniel A. Connors, Wen-mei W. Hwu
Dynamic Memory Disambiguation in the Presence of Out-of-Order Store IssuingSoner Onder, Rajiv Gupta
Read-After-Read Memory Dependence PredictionAndreas Moshovos, Gurindar S. Sohi
Delaying Physical Register Allocation through Virtual-Physical RegistersTeresa Monreal, Antonio González, Mateo Valero, José González, Victor Viñals
DIVA: A Reliable Substrate for Deep Submicron Microarchitecture DesignTodd M. Austin
Exploiting ILP in Page-based Intelligent MemoryMark Oskin, Justin Hensley, Diana Keen, Frederic T. Chong, Matthew K. Farrens, Aneet Chopra
The Use of Multithreading for Exception HandlingCraig B. Zilles, Joel S. Emer, Gurindar S. Sohi
Value Prediction for Speculative Multithreaded ArchitecturesPedro Marcuello, Jordi Tubella, Antonio González
Predicting the Usefulness of a Block Result: A Micro-Architectural Technique for High-Performance Low-Power ProcessorsEnric Musoll
Selective Cache Ways: On-Demand Cache Resource AllocationDavid H. Albonesi
Wavefront Scheduling: Path Based Data Representation and Scheduling of SubgraphsJay Bharadwaj, Kishore N. Menezes, Chris McKinsey
Balance Scheduling: Weighting Branch Tradeoffs in SuperblocksAlexandre E. Eichenberger, Waleed Meleis
Optimizations and Oracle Parallelism with Dynamic TranslationKemal Ebcioglu, Erik R. Altman, Sumedh W. Sathaye, Michael Gschwind

MICRO 2000

Paper TitleAuthors
Eager Writeback - A Technique for Improving Bandwidth UtilizationHsien-Hsin S. Lee, Gary S. Tyson, Matthew K. Farrens
Silent Stores for FreeKevin M. Lepak, Mikko H. Lipasti
A Permutation-Based Page Interleaving Scheme to Reduce Row-Buffer Conflicts and Exploit Data LocalityZhao Zhang, Zhichun Zhu, Xiaodong Zhang
Predictor-Directed Stream BuffersTimothy Sherwood, Suleyman Sair, Brad Calder
On Pipelining Dynamic Instruction Scheduling LogicJared Stark, Mary D. Brown, Yale N. Patt
The Impact of Delay on the Design of Branch PredictorsDaniel A. Jiménez, Stephen W. Keckler, Calvin Lin
Improving BTB Performance in the Presence of DLLsStevan A. Vlaovic, Edward S. Davidson, Gary S. Tyson
Efficient Checker Processor DesignSaugata Chatterjee, Christopher T. Weaver, Todd M. Austin
An Integrated Approach to Accelerate Data and Predicate Computations in HyperblocksAlexandre E. Eichenberger, Waleed Meleis, Suman Maradani
Accurate and Efficient Predicate Analysis with Binary Decision DiagramsJohn W. Sias, Wen-mei W. Hwu, David I. August
Modulo Scheduling for a Fully-Distributed Clustered VLIW ArchitectureF. Jesús Sánchez, Antonio González
Two-Level Hierarchical Register File Organization for VLIW ProcessorsJavier Zalamea, Josep Llosa, Eduard Ayguadé, Mateo Valero
PipeRench Implementation of the Instruction Path CoprocessorYuan C. Chou, Pazhani Pillai, Herman Schmit, John Paul Shen
Efficient Conditional Operations for Data-Parallel ArchitecturesUjval J. Kapasi, William J. Dally, Scott Rixner, Peter R. Mattson, John D. Owens, Brucek Khailany
Flexible Hardware Acceleration for Multimedia Oriented MicroprocessorsFrederik Vermeulen, Lode Nachtergaele, Francky Catthoor, Diederik Verkest, Hugo De Man
Very Low Power Pipelines Using Significance CompressionRamon Canal, Antonio González, James E. Smith
A Static Power Model for ArchitectsJ. Adam Butts, Gurindar S. Sohi
A Framework for Dynamic Energy Efficiency and Temperature ManagementMichael C. Huang, Jose Renau, Seung-Moon Yoo, Josep Torrellas
Dynamic Zero Compression for Cache Energy ReductionLuis Villa, Michael Zhang, Krste Asanovic
Register Integration: A Simple and Efficient Implementation of Squash ReuseAmir Roth, Gurindar S. Sohi
The Store-Load Address Table and Speculative Register PromotionMatt Postiff, David A. Greene, Trevor N. Mudge
Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor ArchitecturesRajeev Balasubramonian, David H. Albonesi, Alper Buyuktosunoglu, Sandhya Dwarkadas
Frequent Value Compression in Data CachesJun Yang, Youtao Zhang, Rajiv Gupta
A Study of Slipstream ProcessorsZachary Purser, Karthik Sundaramoorthy, Eric Rotenberg
Relational Profiling: Enabling Thread-Level Parallelism in Virtual MachinesTimothy H. Heil, James E. Smith
Calpa: A Tool for Automating Selective Dynamic CompilationMarkus Mock, Craig Chambers, Susan J. Eggers
Increasing the Size of Atomic Instruction Blocks Using Control Flow AssertionsSanjay J. Patel, Tony Tung, Satarupa Bose, Matthew M. Crum
Reducing Wire Delay Penalty Through Value PredictionJoan-Manuel Parcerisa, Antonio González
Compiler Controlled Value Prediction Using Branch Predictor Based ConfidenceEric Larson, Todd M. Austin
Instruction Distribution Heuristics for Quad-Cluster, Dynamically-Scheduled, Superscalar ProcessorsAmirali Baniasadi, Andreas Moshovos
Performance Improvement with Circuit-Level SpeculationTong Liu, Shih-Lien Lu