Lightning Session Presentation Order
- FPB: Fine-grained Power Budgeting to Improve Write Throughput
of Multi-level Cell Phase Change Memory
- Leveraging Heterogeneity in DRAM Main Memories to Accelerate
Critical Word Access
- Transactional Memory Architecture and Implementation for IBM
System z
- Warped-DMR: Light-weight Error Detection for GPGPU
- The Performance Vulnerability of Architectural and
Non-architectural Arrays to Permanent Faults
- NoCAlert: An On-Line and Real-Time Fault Detection Mechanism
for Network-on-Chip Architectures
- Cache-Conscious Wavefront Scheduling
- Libra: Tailoring SIMD Execution using Heterogeneous Hardware
and Dynamic Configurability
- Unifying Primary Cache, Scratch, and Register File Memories in
a Throughput Processor
- Kernel Weaver: Automatically Fusing Database Primitives for
Efficient GPU Computation
- KnightShift: Scaling the Energy Proportionality Wall Through
Server-level Heterogeneity
- Rethinking DRAM Powermodes for Energy Proportionality
- CoScale: Coordinating CPU and Memory System DVFS in Server
Systems
- Predicting Performance Impact of DVFS for Realistic Memory
Systems
- Vector Extensions for Decision Support DBMS Acceleration
- NOC-Out: Microarchitecting a Scale-Out Processor
- SLICC: Self-Assembly of Instruction Cache Collectives for OLTP
Workloads
- Systematic Energy Characterization of CMP/SMT Processor Systems
via Automated Micro-Benchmarks
- AUDIT: Stress Testing the Automatic Way
- Accurate Fine-Grained Processor Power Proxies
- Fundamental Latency Trade-offs in Architecting DRAM Caches
- A Mostly-Clean DRAM Cache for Effective Hit Speculation and
Self-Balancing Dispatch
- CoLT: Coalesced Large-Reach TLBs
- NoRD: Node-Router Decoupling for Effective Power-gating of
On-Chip Routers
- Dynamic Reconfiguration of 3D Photonic On-chip Interconnects
for Maximizing Performance and Improving Fault Tolerance
- Addressing End-to-End Memory Access Latency in NoC-Based
Multicores
- MorphCore: An Energy-Efficient Microarchitecture for High
Performance ILP and High Throughput TLP
- Composite Cores: Pushing Heterogeneity into a Core
- Control-Flow Decoupling
- Spatiotemporal Coherence Tracking
- Predicting Coherence Communication by Tracking Synchronization
Points at Run Time
- Vulcan: Hardware Support for Detecting Sequential Consistency
Violations Dynamically
- Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the
Memory Hierarchy
- Improving Cache Management Policies Using Dynamic Reuse
Distances
- Kernel Partitioning of Streaming Applications: A Statistical
Approach to an NP-complete Problem
- Inferred Models for Dynamic and Sparse Hardware-Software Spaces
- SMARTQ: Software-Managed Alias Register Queue for Dynamic
Optimizations
- Profiling Data-Dependence to Assist Parallelization: Framework,
Scope, and Optimization
- Neural Acceleration for General-Purpose Approximate Programs
- Designing a Programmable Wire-Speed Regular-Expression Matching
Accelerator