Fetching instruction streams
Authors:
Alex Ramirez, Oliverio J. Santana, Josep L. Larriba-Pey and Mateo Valero
UPC-Barcelona
Jordi Girona 1-3, D6
08980 Barcelona (Spain)
Abstract:
Fetch performance is a very important factor because it effectively limits
the overall processor performance. However, there is little performance
advantage in increasing front-end performance beyond what the back-end
can consume. For each processor design, the target is to build the best
possible fetch engine for the required performance level. A fetch engine
will be better if it provides better performance, but also if it takes
fewer resources, requires less chip area, or consumes less power.
In this paper we propose a novel fetch architecture based on the execution
of long streams of sequential instructions, taking maximum advantage of
code layout optimizations.
We describe our architecture in detail, and show that it requires less
complexity and resources than other high performance fetch architectures
like the trace cache, while providing a high fetch performance suitable for
wide-issue superscalar processors.
Our results show that using our fetch architecture and code layout
optimizations obtains 10\% higher performance than the EV8 fetch
architecture, and 4\% higher than the FTB architecture using
state-of-the-art branch predictors, while being only 1.5\% slower than the
trace cache. Even in the absence of code layout optimizations, fetching
instruction streams is still 10\% faster than the EV8, and only 4\% slower
than the trace cache.
Fetching instruction streams effectively exploits the special
characteristics of layout optimized codes to provide a high fetch
performance, close to that of a trace cache, but has a much lower cost
and complexity, similar to that of a basic block architecture.
Web Site:
http://personals.ac.upc.es/aramirez/papers/index.html