Item Details

Print View

Evaluating the Energy Efficiency of Trace Caches

Co, Michelle; Skadron, Kevin
Co, Michelle
Skadron, Kevin
Sequential trace caches are highly energy and power-efficient. Fetch engines which include a sequential trace cache provide higher performance for approximately equal area at a significant energy and power savings. The results of our preliminary experiments show that sequential trace caches are a power-efficient design. Previous work has evaluated the trace cache design space with respect to performance. In addition, some previous work has evaluated power-efficiency techniques for trace caches. This work evaluates the trace cache design space considering not only performance but also energy and power. In addition, we compare fetch engine designs which include trace caches with fetch engine designs have instruction caches only. We perform a set of fetch engine area and associativity experiments as well as a next trace predictor design space exploration. We find that when examining performance and average fetch power, fetch engines with trace caches may not seem appealing, but when examining energy-delay and energy-delay-squared, the benefits of a trace cache become clear. Even if average fetch power is increased due to the increased fetch engine area, the energy-efficiency is still improves with a trace cache due to faster execution and more opportunities for clock gating, making the trace cache superior in terms of energy-delay and energy-delay-squared products. Results of current experiments show that sequential trace cache designs compare very favorably to instruction-cache-only designs with respect to power and energy consumption. Our preliminary results show that overall sequential trace caches clearly outperform instruction-cache-only designs with better energy-efficiency. In examining the best design points of the fetch engines examined, a 343KB, 4-way set associative trace cache fetch engine outperforms a 292KB instruction-cacheonly fetch engine by 5% for integer benchmarks and 1% for floating point benchmarks. In addition, it does so using 68.3% less average fetch power, 70.3% less energy, 67.7% less energy-delay, and 69.1% less energy-delay-squared than a 292KB Note: Abstract extracted from PDF text
Date Received
University of Virginia, Department of Computer Science, 2003
Published Date
All rights reserved (no additional license for public reuse)
Libra Open Repository


Access Online