Optimization of Inter-Cache Traffic Entanglement in Tagless Caches with Tiling Opportunities
Date7th Aug 2020
Time02:00 PM
Venue Google meet (https://meet.google.com/nvi-vixt-crf)
PAST EVENT
Details
The need for large last-level caches is increasing with the drastic increase in the working sets of high-end applications. High access latency and large meta-data requirements are short-comings of large caches. Tagless caches are introduced to overcome these drawbacks. In a 4-level cache hierarchy, a tagless caches use the L3 tags to locate data in the L4 cache and reduce the miss latency by skipping the L4. Tagless cache techniques require vigorous enforcement of a reverse inclusion property between the (n-1)th and nth level caches. Tagless cache concept is of particular interest as it can reduce the total storage requirements for cache metadata, thereby increasing the capacity of the L4 cache. Also, it reduces the average miss latency at the L3 cache. However, prior tagless cache designs enforce the eviction of L4 data on an L3 eviction. We refer to this as inter-cache entanglement. We explore new cache organization policies that mitigate overheads stemming from the inter-cache-level replacement entanglement. We incorporate support for explicit tiling shapes that can better match software access patterns to improve the spatial and temporal locality of large block allocations in many essential computational kernels.
To address entanglement overheads and pathologies, we propose new replacement policies and energy-friendly mechanisms for tagless LLCs such as Restricted Block Caching (RBC) and Victim tag Buffer Caching (VBC) to incorporate L4 eviction costs into L3 replacement decisions efficiently. We evaluate our schemes on a range of linear algebra kernels that are software tiled. RBC and VBC significantly reduce memory traffic and increase the speedup compared to baseline tagless cache. We also show that matching the shape of the hardware allocation for each tagless region superblocks to the access order of the software tile improves latency by 13.4% over the baseline tagless cache with reductions in memory traffic of 51% over linear superblocks.
Speakers
S.R. Swamy Saranam Chongala (CS14D403)
Computer Science and Engineering