Research Terms
The Lookahead Instruction Fetch Engine (LIFE) provides a mechanism to guarantee instruction fetch behavior in order to avoid access to fetch-associated structures, including the level one instruction cache (Ll IC), instruction translation look aside buffer (ITLB), branch predictor (BP), branch target buffer (BTB), and return address stack (RAS). Systems and methods may be provided for lookahead instruction fetching for processors. The systems and methods may include an L1 instruction cache, where the L1 instruction cache may include a plurality of lines of data, where each line of data may include one or more instructions. The systems and methods may also include a tagless hit instruction cache, where the tagless hit instruction cache may store a subset of the lines of data in the L1 instruction cache, where instructions in the lines of data stored in the tagless hit instruction cache may be stored with metadata indicative of whether a next instruction is guaranteed to reside in the tagless hit instruction cache, where an instruction fetcher may be arranged to have direct access to the L1 instruction cache and the tagless hit instruction cache, and where the tagless hit instruction cache may be arranged to have direct access to the L1 instruction cache.
LIFE can both reduce energy consumption and power requirements with no or negligible impact on application execution times. It can be used to reduce energy consumption in embedded processors to extend battery life. It can be used to decrease power requirements of general purpose processors to help address heat issues. LIFE, unlike most energy saving features, does not come at the cost of increased execution time. It will result in a significant improvement over the state of the art and will extend the life of batteries making mobile computing more practical. Finally, it will allow general-purpose processors to run at a faster clock rate with similar heat being generated.
The need for energy efficiency continues to grow for many classes of processors, including those for which performance remains vital. Not only is the data cache crucial for good performance, but it also represents a significant portion of the processor’s energy expenditure. We describe the implementation and use of a tagless access buffer (TAB) that greatly improves data access energy efficiency while slightly improving performance
The compiler recognizes memory reference patterns within loops and allocates these references to a small TAB. This combined hardware/software approach reduces energy usage by
(1) re- placing many first-level data cache (L1D) accesses with accesses to the smaller, more power-efficient TAB;
(2) removing the need to perform tag checks or data translation lookaside buffer (DTLB) lookups for TAB accesses; and
(3) minimizing DTLB lookups when transferring data between the L1D and the TAB.
Accesses to the TAB occur earlier in the pipeline, and data lines are automatically prefetched from lower memory levels, which result in a small performance improvement. In addition, we can avoid many unnecessary block transfers between other memory hierarchy levels by characterizing how data in the TAB is used.
With a combined size equal to that of a conventional 32-entry register file, a four-entry TAB eliminates 40% of L1D accesses and 42% of DTLB accesses, on average. This configuration reduces data-access related energy by 35% while simultaneously decreasing execution time by 3%.
The Lookahead Instruction Fetch Engine (LIFE) provides a mechanism to guarantee instruction fetch behavior in order to avoid access to fetch-associated structures, including the level one instruction cache (Ll IC), instruction translation look aside buffer (ITLB), branch predictor (BP), branch target buffer (BTB), and return address stack (RAS). Systems and methods may be provided for lookahead instruction fetching for processors. The systems and methods may include an L1 instruction cache, where the L1 instruction cache may include a plurality of lines of data, where each line of data may include one or more instructions. The systems and methods may also include a tagless hit instruction cache, where the tagless hit instruction cache may store a subset of the lines of data in the L1 instruction cache, where instructions in the lines of data stored in the tagless hit instruction cache may be stored with metadata indicative of whether a next instruction is guaranteed to reside in the tagless hit instruction cache, where an instruction fetcher may be arranged to have direct access to the L1 instruction cache and the tagless hit instruction cache, and where the tagless hit instruction cache may be arranged to have direct access to the L1 instruction cache.
LIFE can both reduce energy consumption and power requirements with no or negligible impact on application execution times. It can be used to reduce energy consumption in embedded processors to extend battery life. It can be used to decrease power requirements of general purpose processors to help address heat issues. LIFE, unlike most energy saving features, does not come at the cost of increased execution time. It will result in a significant improvement over the state of the art and will extend the life of batteries making mobile computing more practical. Finally, it will allow general-purpose processors to run at a faster clock rate with similar heat being generated.
The need for energy efficiency continues to grow for many classes of processors, including those for which performance remains vital. Not only is the data cache crucial for good performance, but it also represents a significant portion of the processor’s energy expenditure. We describe the implementation and use of a tagless access buffer (TAB) that greatly improves data access energy efficiency while slightly improving performance
The compiler recognizes memory reference patterns within loops and allocates these references to a small TAB. This combined hardware/software approach reduces energy usage by
(1) re- placing many first-level data cache (L1D) accesses with accesses to the smaller, more power-efficient TAB;
(2) removing the need to perform tag checks or data translation lookaside buffer (DTLB) lookups for TAB accesses; and
(3) minimizing DTLB lookups when transferring data between the L1D and the TAB.
Accesses to the TAB occur earlier in the pipeline, and data lines are automatically prefetched from lower memory levels, which result in a small performance improvement. In addition, we can avoid many unnecessary block transfers between other memory hierarchy levels by characterizing how data in the TAB is used.
With a combined size equal to that of a conventional 32-entry register file, a four-entry TAB eliminates 40% of L1D accesses and 42% of DTLB accesses, on average. This configuration reduces data-access related energy by 35% while simultaneously decreasing execution time by 3%.