A Quantitative Framework for Automated Pre-Execution Thread Selection
Authors:
Amir Roth
Department of Computer and Information Science
University of Pennsylvania
Gurindar S. Sohi
Computer Sciences Department
University of Wisconsin-Madison
Abstract
Pre-execution attacks cache misses for which address-prediction
driven prefetching fails. In pre-execution, cop-ies
of cache miss computations are isolated from the main
program and launched as separate threads called p-threads
whenever the processor anticipates an upcoming
miss. P-thread selection is the task of deciding what com-putations
should execute as p-threads and when they
should be launched such that total execution time is mini-mized.
It is central to the success of pre-execution.
We introduce a framework for automated static p-thread
selection, a static p-thread being one whose
dynamic instances are repeatedly launched during the
course of program execution. Our approach is to formalize
the problem quantitatively and then apply standard tech-niques
to solve it analytically. The framework has two
novel components. The slice tree is a data structure that
compactly represents a set of static p-threads and the rela-tionships
among them. Aggregate advantage is a formula
that uses raw program statistics and computation structure
to assign each candidate static p-thread a numeric score
based on estimated latency tolerance and overhead aggre-gated
over its expected dynamic executions.
We use the framework to select p-threads that cover L2
misses and study its effectiveness under different condi-tions
via detailed simulation. We measure the effect of con-straining
p-thread length, locally optimizing p-threads,
using different program samples as a statistical basis for
selection, and varying several machine parameters. Our
framework responds to these changes in an intuitive way.
We also validate that aggregate advantage correctly mod-els
actual pre-execution.