MICRO-48- Proceedings of the 48th International Symposium on Microarchitecture
Full Citation in the ACM Digital Library
SESSION: Best paper candidates
Large pages and lightweight memory management in virtualized environments: can you have it both ways?
Binh Pham
Ján Veselý
Gabriel H. Loh
Abhishek Bhattacharjee
Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems
Guowei Zhang
Webb Horn
Daniel Sanchez
CCICheck: using µhb graphs to verify the coherence-consistency interface
Yatin A. Manerkar
Daniel Lustig
Michael Pellauer
Margaret Martonosi
SESSION: Cache
HyComp: a hybrid cache compression method for selection of data-type-specific compression methods
Angelos Arelakis
Fredrik Dahlgren
Per Stenstrom
Doppelgänger: a cache for approximate computing
Joshua San Miguel
Jorge Albericio
Andreas Moshovos
Natalie Enright Jerger
The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory
Lavanya Subramanian
Vivek Seshadri
Arnab Ghosh
Samira Khan
Onur Mutlu
MORC: a manycore-oriented compressed cache
Tri M. Nguyen
David Wentzlaff
SESSION: Security
Avoiding information leakage in the memory controller with fixed service policies
Ali Shafiee
Akhila Gundu
Manjunath Shevgoor
Rajeev Balasubramonian
Mohit Tiwari
Fork path: improving efficiency of ORAM by removing redundant memory accesses
Xian Zhang
Guangyu Sun
Chao Zhang
Weiqi Zhang
Yun Liang
Tao Wang
Yiran Chen
Jia Di
Locking down insecure indirection with hardware-based control-data isolation
William Arthur
Sahil Madeka
Reetuparna Das
Todd Austin
Authenticache: harnessing cache ECC for system authentication
Anys Bacha
Radu Teodorescu
SESSION: Prefetching
Efficiently prefetching complex address patterns
Manjunath Shevgoor
Sahil Koladiya
Rajeev Balasubramonian
Chris Wilkerson
Seth H. Pugsley
Zeshan Chishti
Self-contained, accurate precomputation prefetching
Islam Atta
Xin Tong
Vijayalakshmi Srinivasan
Ioana Baldini
Andreas Moshovos
Confluence: unified instruction supply for scale-out servers
Cansu Kaynak
Boris Grot
Babak Falsafi
IMP: indirect memory prefetcher
Xiangyao Yu
Christopher J. Hughes
Nadathur Satish
Srinivas Devadas
SESSION: Concurrency
DeSC: decoupled supply-compute communication management for heterogeneous architectures
Tae Jun Ham
Juan L. Aragón
Margaret Martonosi
Efficient warp execution in presence of divergence with collaborative context collection
Farzad Khorasani
Rajiv Gupta
Laxmi N. Bhuyan
Control flow coalescing on a hybrid dataflow/von Neumann GPGPU
Dani Voitsechov
Yoav Etsion
A scalable architecture for ordered parallelism
Mark C. Jeffrey
Suvinay Subramanian
Cong Yan
Joel Emer
Daniel Sanchez
SESSION: DRAM
More is less: improving the energy efficiency of data movement via opportunistic use of sparse codes
Yanwei Song
Engin Ipek
Improving DRAM latency with dynamic asymmetric subarray
Shih-Lien Lu
Ying-Chen Lin
Chia-Lin Yang
Gather-scatter DRAM: in-DRAM address translation to improve the spatial locality of non-unit strided accesses
Vivek Seshadri
Thomas Mullins
Amirali Boroumand
Onur Mutlu
Phillip B. Gibbons
Michael A. Kozuch
Todd C. Mowry
SESSION: Voltage
The CRISP performance model for dynamic voltage and frequency scaling in a GPGPU
Rajib Nath
Dean Tullsen
Safe limits on voltage reduction efficiency in GPUs: a direct measurement approach
Jingwen Leng
Alper Buyuktosunoglu
Ramon Bertran
Pradip Bose
Vijay Janapa Reddi
Adaptive guardband scheduling to improve system-level efficiency of the POWER7+
Yazhou Zu
Charles R. Lefurgy
Jingwen Leng
Matthew Halpern
Michael S. Floyd
Vijay Janapa Reddi
SESSION: Micro-architecture
DynaMOS: dynamic schedule migration for heterogeneous cores
Shruti Padmanabha
Andrew Lukefahr
Reetuparna Das
Scott Mahlke
Long term parking (LTP): criticality-aware resource allocation in OOO processors
Andreas Sembrant
Trevor Carlson
Erik Hagersten
David Black-Shaffer
Arthur Perais
André Seznec
Pierre Michaud
The inner most loop iteration counter: a new dimension in branch history
André Seznec
Joshua San Miguel
Jorge Albericio
Filtered runahead execution with a runahead buffer
Milad Hashemi
Yale N. Patt
Bungee jumps: accelerating indirect branches through HW/SW co-design
Daniel S. McFarlin
Craig Zilles
SESSION: GPU
SAWS: synchronization aware GPGPU warp scheduling for multiple independent warp schedulers
Jiwei Liu
Jun Yang
Rami Melhem
Enabling coordinated register allocation and thread-level parallelism optimization for GPUs
Xiaolong Xie
Yun Liang
Xiuhong Li
Yudong Wu
Guangyu Sun
Tao Wang
Dongrui Fan
Free launch: optimizing GPU dynamic kernel launches through thread reuse
Guoyang Chen
Xipeng Shen
GPU register file virtualization
Hyeran Jeon
Gokul Subramanian Ravi
Nam Sung Kim
Murali Annavaram
WarpPool: sharing requests with inter-warp coalescing for throughput processors
John Kloosterman
Jonathan Beaumont
Mick Wollman
Ankit Sethia
Ron Dreslinski
Trevor Mudge
Scott Mahlke
SESSION: Accelerator
Ultra-low power render-based collision detection for CPU/GPU systems
Enrique de Lucas
Pedro Marcuello
Joan-Manuel Parcerisa
Antonio González
Execution time prediction for energy-efficient hardware accelerators
Tao Chen
Alexander Rucker
G. Edward Suh
Border control: sandboxing accelerators
Lena E. Olson
Jason Power
Mark D. Hill
David A. Wood
Neural acceleration for GPU throughput processors
Amir Yazdanbakhsh
Jongse Park
Hardik Sharma
Pejman Lotfi-Kamran
Hadi Esmaeilzadeh
Neuromorphic accelerators: a comparison between neuroscience and machine-learning approaches
Zidong Du
Daniel D. Ben-Dayan Rubin
Yunji Chen
Liqiang He
Tianshi Chen
Lei Zhang
Chengyong Wu
Olivier Temam
SESSION: Mobile & emerging systems
Prediction-guided performance-energy trade-off for interactive applications
Daniel Lo
Taejoon Song
G. Edward Suh
Architecture-aware automatic computation offload for native applications
Gwangmu Lee
Hyunjoon Park
Seonyeong Heo
Kyung-Ah Chang
Hyogun Lee
Hanjun Kim
Fast support for unstructured data processing: the unified automata processor
Yuanwei Fang
Tung T. Hoang
Michela Becchi
Andrew A. Chien
Enabling interposer-based disintegration of multi-core processors
Ajaykumar Kannan
Natalie Enright Jerger
Gabriel H. Loh
DCS: a fast and scalable device-centric server architecture
Jaehyung Ahn
Dongup Kwon
Youngsok Kim
Mohammadamin Ajdari
Jaewon Lee
Jangwoo Kim
SESSION: Datacenter
Modeling the implications of DRAM failures and protection techniques on datacenter TCO
Panagiota Nikolaou
Yiannakis Sazeides
Lorena Ndreu
Marios Kleanthous
TimeTrader: exploiting latency tail to save datacenter energy for online search
Balajee Vamanan
Hamza Bin Sohail
Jahangir Hasan
T. N. Vijaykumar
Rubik: fast analytical power management for latency-critical systems
Harshad Kasture
Davide B. Bartolini
Nathan Beckmann
Daniel Sanchez
SESSION: Memory systems
CLEAN-ECC: high reliability ECC for adaptive granularity memory system
Seong-Lyong Gong
Minsoo Rhu
Jungrae Kim
Jinsuk Chung
Mattan Erez
vCache: architectural support for transparent and isolated virtual LLCs in virtualized environments
Daehoon Kim
Hwanju Kim
Nam Sung Kim
Jaehyuk Huh
An integrated concurrency and core-ISA architectural envelope definition, and test oracle, for IBM POWER multiprocessors
Kathryn E. Gray
Gabriel Kerneis
Dominic Mulligan
Christopher Pulte
Susmit Sarkar
Peter Sewell
SESSION: Coherence, consistency, persistency
Efficient GPU synchronization without scopes: saying no to complex consistency models
Matthew D. Sinclair
Johnathan Alsop
Sarita V. Adve
Efficient persist barriers for multicores
Arpit Joshi
Vijay Nagarajan
Marcelo Cintra
Stratis Viglas
ThyNVM: enabling software-transparent crash consistency in persistent memory systems
Jinglei Ren
Jishen Zhao
Samira Khan
Jongmoo Choi
Yongwei Wu
Onur Mutlu
Coherence domain restriction on large scale systems
Yaosheng Fu
Tri M. Nguyen
David Wentzlaff
Efficiently enforcing strong memory ordering in GPUs
Abhayendra Singh
Shaizeen Aga
Satish Narayanasamy
SESSION: Modeling & characterization
Characterizing, modeling, and improving the QoE of mobile devices with low battery level
Kaige Yan
Xingyao Zhang
Xin Fu
Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance
Newsha Ardalani
Clint Lestourgeon
Karthikeyan Sankaralingam
Xiaojin Zhu
A fast and accurate analytical technique to compute the AVF of sequential bits in a processor
Steven Raasch
Arijit Biswas
Jon Stephan
Paul Racunas
Joel Emer
Enabling portable energy efficiency with memory accelerated library
Qi Guo
Tze-Meng Low
Nikolaos Alachiotis
Berkin Akin
Larry Pileggi
James C. Hoe
Franz Franchetti
Microarchitectural implications of event-driven server-side web applications
Yuhao Zhu
Daniel Richins
Matthew Halpern
Vijay Janapa Reddi