Eligible Papers

MICRO 2000

Paper TitleAuthors
Eager Writeback – A Technique for Improving Bandwidth UtilizationHsien-Hsin S. Lee, Gary S. Tyson, Matthew K. Farrens
Silent Stores for FreeKevin M. Lepak, Mikko H. Lipasti
Predictor-Directed Stream BuffersTimothy Sherwood, Suleyman Sair, Brad Calder
On Pipelining Dynamic Instruction Scheduling LogicJared Stark, Mary D. Brown, Yale N. Patt
The Impact of Delay on the Design of Branch PredictorsDaniel A. Jiménez, Stephen W. Keckler, Calvin Lin
Improving BTB Performance in the Presence of DLLsStevan A. Vlaovic, Edward S. Davidson, Gary S. Tyson
Efficient Checker Processor DesignSaugata Chatterjee, Christopher T. Weaver, Todd M. Austin
An Integrated Approach to Accelerate Data and Predicate Computations in HyperblocksAlexandre E. Eichenberger, Waleed Meleis, Suman Maradani
Accurate and Efficient Predicate Analysis with Binary Decision DiagramsJohn W. Sias, Wen-mei W. Hwu, David I. August
Modulo Scheduling for a Fully-Distributed Clustered VLIW ArchitectureF. Jesús Sánchez, Antonio González
Two-Level Hierarchical Register File Organization for VLIW ProcessorsJavier Zalamea, Josep Llosa, Eduard Ayguadé, Mateo Valero
PipeRench Implementation of the Instruction Path CoprocessorYuan C. Chou, Pazhani Pillai, Herman Schmit, John Paul Shen
Efficient Conditional Operations for Data-Parallel ArchitecturesUjval J. Kapasi, William J. Dally, Scott Rixner, Peter R. Mattson, John D. Owens, Brucek Khailany
Flexible Hardware Acceleration for Multimedia Oriented MicroprocessorsFrederik Vermeulen, Lode Nachtergaele, Francky Catthoor, Diederik Verkest, Hugo De Man
Very Low Power Pipelines Using Significance CompressionRamon Canal, Antonio González, James E. Smith
A Static Power Model for ArchitectsJ. Adam Butts, Gurindar S. Sohi
A Framework for Dynamic Energy Efficiency and Temperature ManagementMichael C. Huang, Jose Renau, Seung-Moon Yoo, Josep Torrellas
Dynamic Zero Compression for Cache Energy ReductionLuis Villa, Michael Zhang, Krste Asanovic
Register Integration: A Simple and Efficient Implementation of Squash ReuseAmir Roth, Gurindar S. Sohi
The Store-Load Address Table and Speculative Register PromotionMatt Postiff, David A. Greene, Trevor N. Mudge
Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor ArchitecturesRajeev Balasubramonian, David H. Albonesi, Alper Buyuktosunoglu, Sandhya Dwarkadas
Frequent Value Compression in Data CachesJun Yang, Youtao Zhang, Rajiv Gupta
A Study of Slipstream ProcessorsZachary Purser, Karthik Sundaramoorthy, Eric Rotenberg
Relational Profiling: Enabling Thread-Level Parallelism in Virtual MachinesTimothy H. Heil, James E. Smith
Calpa: A Tool for Automating Selective Dynamic CompilationMarkus Mock, Craig Chambers, Susan J. Eggers
Increasing the Size of Atomic Instruction Blocks Using Control Flow AssertionsSanjay J. Patel, Tony Tung, Satarupa Bose, Matthew M. Crum
Reducing Wire Delay Penalty Through Value PredictionJoan-Manuel Parcerisa, Antonio González
Compiler Controlled Value Prediction Using Branch Predictor Based ConfidenceEric Larson, Todd M. Austin
Instruction Distribution Heuristics for Quad-Cluster, Dynamically-Scheduled, Superscalar ProcessorsAmirali Baniasadi, Andreas Moshovos
Performance Improvement with Circuit-Level SpeculationTong Liu, Shih-Lien Lu

MICRO 2001

Paper TitleAuthors
Skipper: A Microarchitecture for Exploiting Control-Flow IndependenceChen-Yong Cher, T. N. Vijaykumar
Performance Characterization of a Hardware Mechanism for Dynamic OptimizationBrian Fahs, Satarupa Bose, Matthew Crum, Brian Slechta, Francesco Spadini, Tony Tung, Sanjay J. Patel, Steven S. Lumetta
Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time SystemsEric Rotenberg
A Design Space Evaluation of Grid Processor ArchitecturesRamadass Nagarajan, Karthikeyan Sankaralingam, Doug Burger, Stephen W. Keckler
Reducing Set-Associative Cache Energy via Way-Prediction and Selective Direct-MappingMichael D. Powell, Amit Agarwal, T. N. Vijaykumar, Babak Falsafi, Kaushik Roy
A Code Decompression Architecture for VLIW ProcessorsYuan Xie, Wayne Wolf, Haris Lekatsas
Direct Load: Dependence-Linked Dataflow Resolution of Load Address and Cache CoordinateByung-Kwon Chung, Jinsuo Zhang, Jih-Kwon Peir, Shih-Chang Lai, Konrad Lai
Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath ResourcesDmitry Ponomarev, Gurhan Kucuk, Kanad Ghose
Exploiting VLIW Schedule Slacks for Dynamic and Leakage Energy ReductionWensheng Zhang, Vijaykrishnan Narayanan, Mahmut Kandemir, Mary Jane Irwin, David Duarte, Yuh-Fang Tsai
Reducing Power with Dynamic Critical Path InformationJohn S. Seng, Eric S. Tune, Dean M. Tullsen
Direct Addressed Caches for Reduced Power ConsumptionEmmett Witchel, Sam Larsen, C. Scott Ananian, Krste Asanović
Modulo Schedule BuffersMatthew C. Merten, Wen-mei W. Hwu
Graph-Partitioning Based Instruction Scheduling for Clustered ProcessorsAlex Aletà, Josep M. Codina, Jesús Sánchez, Antonio González
Modulo Scheduling with Integrated Register Spilling for Clustered VLIW ArchitecturesJavier Zalamea, Josep Llosa, Eduard Ayguadé, Mateo Valero
Efficient Static Single Assignment Form for PredicationArthur Stoutchinin, Francois de Ferriere
The Impact of If-Conversion and Branch Prediction on Program Execution on the Intel® Itanium™ ProcessorYoungsoo Choi, Allan Knies, Luke Gerke, Tin-Fook Ngai
Mapping Reference Code to Irregular DSPs Within the Retargetable, Optimizing Compiler COGEN(T)Gary William Gréwal, Charles Thomas Wilson
Select-Free Instruction Scheduling LogicMary D. Brown, Jared Stark, Yale N. Patt
Dual Use of Superscalar Datapath for Transient-Fault Detection and RecoveryJoydeep Ray, James C. Hoe, Babak Falsafi
A High-Speed Dynamic Instruction Scheduling Scheme for Superscalar ProcessorsMasahiro Goshima, Kengo Nishino, Toshiaki Kitamura, Yasuhiko Nakashima, Shinji Tomita, Shin-ichiro Mori
Reducing the Complexity of the Register File in Dynamic Superscalar ProcessorsRajeev Balasubramonian, Sandhya Dwarkadas, David H. Albonesi
Saving Energy with Architectural and Frequency Adaptations for Multimedia ApplicationsChristopher J. Hughes, Jayanth Srinivasan, Sarita V. Adve
Enhancing Loop Buffering of Media and Telecommunications Applications Using Low-Overhead PredicationJohn W. Sias, Hillery C. Hunter, Wen-mei W. Hwu
Cool-Cache for Hot MultimediaOsman S. Unsal, Raksit Ashok, Israel Koren, C. Mani Krishna, Csaba Andras Moritz
ZR: A 3D API Transparent Technology for Chunk RenderingEmile Hsieh, Vladimir Pentkovski, Thomas Piazza
Dynamic Speculative PrecomputationJamison D. Collins, Dean M. Tullsen, Hong Wang, John P. Shen
Handling Long-Latency Loads in a Simultaneous Multithreading ProcessorDean M. Tullsen, Jeffery A. Brown
Correctly Implementing Value Prediction in Microprocessors That Support Multithreading or MultiprocessingMilo M. K. Martin, Daniel J. Sorin, Harold W. Cain, Mark D. Hill, Mikko H. Lipasti

MICRO 2002

Paper TitleAuthors
Vacuum Packing – Extracting Hardware-Detected Program Phases for Post-Link OptimizationRonald D. Barnes, Erik M. Nystrom, Matthew C. Merten, Wen-mei W. Hwu
Power Protocol – Reducing Power Dissipation on Off-Chip Data BusesK. Basu, Alok N. Choudhary, Jayaprakash Pisharath, Mahmut T. Kandemir
Hierarchical Scheduling WindowsEdward Brekelbaum, Jeff Rupley, Chris Wilkerson, Bryan Black
Characterizing and Predicting Value Degree of UseJ. Adam Butts, Gurindar S. Sohi
Microarchitectural Support for Precomputation MicrothreadsRobert S. Chappell, Francis Tseng, Adi Yoaz, Yale N. Patt
Pointer Cache Assisted PrefetchingJamison D. Collins, Suleyman Sair, Brad Calder, Dean M. Tullsen
Three-Dimensional Memory Vectorization for High Bandwidth Media Memory SystemsJesús Corbal, Roger Espasa, Mateo Valero
DELI – A New Run-Time Control PointGiuseppe Desoli, Nikolay Mateev, Evelyn Duesterwald, Paolo Faraboschi, Joseph A. Fisher
Managing Static Leakage Energy in Microprocessor Functional UnitsSteve Dropsho, Volkan Kursun, David H. Albonesi, Sandhya Dwarkadas, Eby G. Friedman
A Faster Optimal Register AllocatorChangqing Fu, Kent D. Wilken
Effective Instruction Scheduling Techniques for an Interleaved Cache Clustered VLIW ProcessorEnric Gibert, F. Jesús Sánchez, Antonio González
Microarchitectural Denial of Service – Insuring Microarchitectural FairnessDirk Grunwald, Soraya Ghiasi
Dynamic Addressing Memory Arrays with Physical LocalitySteven Hsu, Shih-Lien Lu, Shih-Chang Lai, Ram Krishnamurthy, Konrad Lai
Generating Physical Addresses Directly for Saving Instruction TLB EnergyIsmail Kadayif, Anand Sivasubramaniam, Mahmut T. Kandemir, Gokul B. Kandiraju, Guangyu Chen
Drowsy Instruction Caches – Leakage Power Reduction Using Dynamic Voltage Scaling and Cache Sub-Bank PredictionNam Sung Kim, Krisztián Flautner, David T. Blaauw, Trevor N. Mudge
Vector vs. Superscalar and VLIW Architectures for Embedded Multimedia BenchmarksChristoforos E. Kozyrakis, David A. Patterson
Compiling for Instruction Cache Performance on a Multithreaded ArchitectureRakesh Kumar, Dean M. Tullsen
Convergent SchedulingWalter Lee, Diego Puppin, Shane Swenson, Saman P. Amarasinghe
Reduced Code Size Modulo Scheduling in the Absence of Hardware SupportJosep Llosa, Stefan M. Freudenberger
Exploiting Data-Width Locality to Increase Superscalar Execution HandwidthGabriel H. Loh
Cherry – Checkpointed Early Resource Recycling in Out-of-Order MicroprocessorsJosé F. Martínez, Jose Renau, Michael C. Huang, Milos Prvulovic, Josep Torrellas
Instruction Fetch Deferral Using Static SlackGregory A. Muthler, David Crowe, Sanjay J. Patel, Steven Lumetta
Reducing Register Ports for Higher Speed and Lower EnergyIl Park, Michael D. Powell, T. N. Vijaykumar
Three Extensions to Register IntegrationVlad Petric, Anne Bracy, Amir Roth
Fetching Instruction StreamsAlex Ramírez, Oliverio J. Santana, Josep-Lluís Larriba-Pey, Mateo Valero
A Quantitative Framework for Automated Pre-Execution Thread SelectionAmir Roth, Gurindar S. Sohi
Dynamic Frequency and Voltage Control for a Multiple Clock Domain MicroarchitectureGreg Semeraro, David H. Albonesi, Steve Dropsho, Grigorios Magklis, Sandhya Dwarkadas, Michael L. Scott
Register Write Specialization Register Read Specialization – A Path to Complexity-Effective Wide-Issue Superscalar ProcessorsAndré Seznec, Eric Toullec, Olivier Rochecouste
Optimizing Pipelines for Power and PerformanceViji Srinivasan, David M. Brooks, Michael Gschwind, Pradip Bose, Victor V. Zyuban, Philip N. Strenski, Philip G. Emma
Using Modern Graphics Architectures for General-Purpose Computing – A Framework and AnalysisChris J. Thompson, Sahngyun Hahn, Mark Oskin
Microarchitectural Exploration with LibertyManish Vachharajani, Neil Vachharajani, David A. Penry, Jason A. Blome, David I. August
Orion – A Power-Performance Simulator for Interconnection NetworksHangsheng Wang, Xinping Zhu, Li-Shiuan Peh, Sharad Malik
Compiler Managed Micro-Cache Bypassing for High Performance EPIC ProcessorsYoufeng Wu, Ryan N. Rakvic, Li-Ling Chen, Chyi-Chang Miao, George Chrysos, Jesse Fang
Energy Efficient Frequent Value Data Cache DesignJun Yang, Rajiv Gupta
Compiler-Directed Instruction Cache Leakage OptimizationWei Zhang, Jie S. Hu, Vijay Degalahal, Mahmut T. Kandemir, Narayanan Vijaykrishnan, Mary Jane Irwin
Master/Slave Speculative ParallelizationCraig B. Zilles, Gurindar S. Sohi

MICRO 2003

Paper TitleAuthors
VSV: L2-Miss-Driven Variable Supply-Voltage Scaling for Low PowerHai Li, Chen-Yong Cher, T. N. Vijaykumar, Kaushik Roy
A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance MicroprocessorShubhendu S. Mukherjee, Christopher T. Weaver, Joel S. Emer, Steven K. Reinhardt, Todd M. Austin
TLC: Transmission Line CachesBradford M. Beckmann, David A. Wood
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache ArchitecturesZeshan Chishti, Michael D. Powell, T. N. Vijaykumar
Near-Optimal Precharging in High-Performance Nanoscale CMOS CachesSe-Hyun Yang, Babak Falsafi
Runtime Power Monitoring in High-End Processors: Methodology and Empirical DataCanturk Isci, Margaret Martonosi
Power-Driven Design of Router Microarchitectures in On-Chip NetworksHangsheng Wang, Li-Shiuan Peh, Sharad Malik
Optimum Power/Performance Pipeline DepthAllan Hartstein, Thomas R. Puzak
Processor Acceleration Through Automated Instruction Set CustomizationNathan Clark, Hongtao Zhong, Scott A. Mahlke
The Reconfigurable Streaming Vector Processor (RSVPTM)Silviu M. S. A. Chiricescu, Ray Essick, Brian Lucas, Phil May, Kent Moat, Jim Norris, Michael A. Schuette, Ali Saidi
Scaling and Characterizing Database Workloads: Bridging the Gap Between Research and PracticeRichard A. Hankins, Trung A. Diep, Murali Annavaram, Brian Hirano, Harald Eri, Hubert Nueckel, John Paul Shen
Generational Cache Management of Code Traces in Dynamic Optimization SystemsKim M. Hazelwood, Michael D. Smith
The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization SystemJiwei Lu, Howard Chen, Rao Fu, Wei-Chung Hsu, Bobbie Othmer, Pen-Chung Yew, Dong-yuan Chen
IA-32 Execution Layer: A Two-Phase Dynamic Translator Designed to Support IA-32 Applications on Itanium-Based SystemsLeonid Baraz, Tevi Devor, Orna Etzion, Shalom Goldenberg, Alex Skaletsky, Yun Wang, Yigel Zemach
LLVA: A Low-level Virtual Instruction Set ArchitectureVikram S. Adve, Chris Lattner, Michael Brukman, Anand Shukla, Brian Gaeke
Comparing Program Phase Detection TechniquesAshutosh S. Dhodapkar, James E. Smith
Using Interaction Costs for Microarchitectural Bottleneck AnalysisBrian A. Fields, Rastislav Bodík, Mark D. Hill, Chris J. Newburn
Fast Path-Based Neural Branch PredictionDaniel A. Jiménez
Hardware Support for Control Transfers in Code CachesHo-Seop Kim, James E. Smith
Exploiting Value Locality in Physical Register FilesSaisanthosh Balakrishnan, Gurindar S. Sohi
Macro-Op Scheduling: Relaxing Scheduling Loop ConstraintsIlhyun Kim, Mikko H. Lipasti
WaveScalarSteven Swanson, Ken Michelson, Andrew Schwerin, Mark Oskin
Universal Mechanisms for Data-Parallel ArchitecturesKarthikeyan Sankaralingam, Stephen W. Keckler, William R. Mark, Doug Burger
Flexible Compiler-Managed L0 Buffers for Clustered VLIW ProcessorsEnric Gibert, F. Jesús Sánchez, Antonio González
Instruction Replication for Clustered MicroarchitecturesAlex Aletà, Josep M. Codina, Antonio González, David R. Kaeli
Efficient Memory Integrity Verification and Encryption for Secure ProcessorsG. Edward Suh, Dwaine E. Clarke, Blaise Gassend, Marten van Dijk, Srinivas Devadas
Fast Secure Processor for Inhibiting Software Piracy and TamperingJun Yang, Youtao Zhang, Lan Gao
IPStash: A Power-Efficient Memory Architecture for IP-LookupStefanos Kaxiras, Georgios Keramidas
Design and Implementation of High-Performance Memory Systems for Future Packet BuffersJorge García-Vidal, Jesús Corbal, Llorenç Cerdà, Mateo Valero
Beating In-Order Stalls with “Flea-Flicker” Two-Pass PipeliningRonald D. Barnes, Erik M. Nystrom, John W. Sias, Sanjay J. Patel, Nacho Navarro, Wen-mei W. Hwu
Scalable Hardware Memory Disambiguation for High ILP ProcessorsSimha Sethumadhavan, Rajagopalan Desikan, Doug Burger, Charles R. Moore, Stephen W. Keckler
Reducing Design Complexity of the Load/Store QueueIl Park, Chong-liang Ooi, T. N. Vijaykumar
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window ProcessorsHaitham Akkary, Ravi Rajwar, Srikanth T. Srinivasan

MICRO 2004

Paper TitleAuthors
The Fuzzy Correlation Between Code and Performance PredictabilityMurali Annavaram, Ryan N. Rakvic, Marzia Polito, Jean-Yves Bouguet, Richard A. Hankins, Bob Davies
Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and RecoveryDavid N. Armstrong, Hyesoon Kim, Onur Mutlu, Yale N. Patt
Cache Refill/Access Decoupling for Vector MachinesChristopher Batten, Ronny Krashinsky, Steve Gerding, Krste Asanovic
Managing Wire Delay in Large Chip-Multiprocessor CachesBradford M. Beckmann, David A. Wood
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and BandwidthAnne Bracy, Prashant Prahlad, Amir Roth
Automatic Synthesis of High-Speed Processor SimulatorsMartin Burtscher, Ilya Ganusov
Dynamically Controlled Resource Allocation in SMT ProcessorsFrancisco J. Cazorla, Alex Ramírez, Mateo Valero, Enrique Fernández
Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set CustomizationNathan Clark, Manjunath Kudlur, Hyunchul Park, Scott A. Mahlke, Krisztián Flautner
Control Flow Optimization Via Dynamic Reconvergence PredictionJamison D. Collins, Dean M. Tullsen, Hong Wang
Minos: Control Data Attack Prevention Orthogonal to Memory ModelJedidiah R. Crandall, Frederic T. Chong
A Hardware-Software Platform for Intrusion PreventionMilenko Drinic, Darko Kirovski
Dynamically Trading Frequency for Complexity in a GALS MicroprocessorSteven G. Dropsho, Greg Semeraro, David H. Albonesi, Grigorios Magklis, Michael L. Scott
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File PressureOguz Ergin, Deniz Balkan, Kanad Ghose, Dmitry V. Ponomarev
Compiler Optimizations for Transaction Processing Workloads on Itanium Linux SystemsGerolf Hoflehner, Knud Kirkegaard, Rod Skinner, Daniel M. Lavery, Yong-Fong Lee, Wei Li
Adaptive History-Based Memory SchedulersIbrahim Hur, Calvin Lin
Conjoined-Core Chip MultiprocessingRakesh Kumar, Norman P. Jouppi, Dean M. Tullsen
A Case for Clumsy Packet ProcessorsArindam Mallik, Gokhan Memik
Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic InstrumentationHarish Patil, Robert S. Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, Anand Karunanidhi
MicroLib: A Case for the Quantitative Comparison of Micro-Architecture MechanismsDaniel Gracia Pérez, Gilles Mouchard, Olivier Temam
Memory Controller Optimizations for Web ServersScott Rixner
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline CommunicationPeter G. Sassone, D. Scott Wills
Thermal Modeling, Characterization and Management of On-Chip NetworksLi Shang, Li-Shiuan Peh, Amit Kumar, Niraj K. Jha
Optimal Superblock Scheduling Using EnumerationGhassan Shobaki, Kent D. Wilken
Efficient Resource Sharing in Concurrent Error Detecting Superscalar MicroarchitecturesJared C. Smolens, Jangwoo Kim, James C. Hoe, Babak Falsafi
Hardware and Binary Modification Support for Code Pointer Protection From Buffer OverflowNathan Tuck, Brad Calder, George Varghese
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading HierarchyEric Tune, Rakesh Kumar, Dean M. Tullsen, Brad Calder
RIFLE: An Architectural Framework for User-Centric Information-Flow SecurityNeil Vachharajani, Matthew J. Bridges, Jonathan Chang, Ram Rangan, Guilherme Ottoni, Jason A. Blome, George A. Reis, Manish Vachharajani, David I. August
Whole Execution TracesXiangyu Zhang, Rajiv Gupta
AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based InvariantsPin Zhou, Wei Liu, Long Fei, Shan Lu, Feng Qin, Yuanyuan Zhou, Samuel P. Midkiff, Josep Torrellas