A Quantitative Framework for Automated Pre-Execution Thread Selection



Pre-execution attacks cache misses for which address-prediction driven prefetching fails. In pre-execution, cop-ies of cache miss computations are isolated from the main program and launched as separate threads called p-threads whenever the processor anticipates an upcoming miss. P-thread selection is the task of deciding what com-putations should execute as p-threads and when they should be launched such that total execution time is mini-mized. It is central to the success of pre-execution. We introduce a framework for automated static p-thread selection, a static p-thread being one whose dynamic instances are repeatedly launched during the course of program execution. Our approach is to formalize the problem quantitatively and then apply standard tech-niques to solve it analytically. The framework has two novel components. The slice tree is a data structure that compactly represents a set of static p-threads and the rela-tionships among them. Aggregate advantage is a formula that uses raw program statistics and computation structure to assign each candidate static p-thread a numeric score based on estimated latency tolerance and overhead aggre-gated over its expected dynamic executions.

We use the framework to select p-threads that cover L2 misses and study its effectiveness under different condi-tions via detailed simulation. We measure the effect of con-straining p-thread length, locally optimizing p-threads, using different program samples as a statistical basis for selection, and varying several machine parameters. Our framework responds to these changes in an intuitive way. We also validate that aggregate advantage correctly mod-els actual pre-execution.