Item Details

Hierarchical Domain Partitioning for Hierarchical Architecture

Meng, J; Che, S; Huang, J; Li, J; Sheaffer, J; Skadron, K
Format
Report
Author
Meng, J
Che, S
Huang, J
Li, J
Sheaffer, J
Skadron, K
Abstract
The history of parallel computing shows that good performance is heavily dependent on data locality. Prior knowledge of data access patterns allows for optimizations that reduce data movement, achieving lower data access latencies. Compilers and runtime systems, however, have difficulties in speculating on locality issues among threads. Future multicore architec- tures are likely to present a hierarchical model of parallelism, with multiple threads on a core and multiple cores on a chip. With such a system, data affinity and localization becomes even more important to efficiently use per-core resources. We show how an application programming interface (API) with the right abstractions can conveniently indicate data locality and that a system can use this information to place threads in a way that minimizes cache miss rates and interconnect traffic. This information is often well understood and easily expressed by the programmer but is typically lost to the system, forcing runtime environments to rediscover it on the fly; a far more costly approach. Our system is particularly well suited for the trend in manycore architectures towards large numbers of simple cores connected by a decentralized interconnect fabric. We study a set of data-parallel benchmarks and show that our technique yields up to a 25% performance gain with 17% reduction in energy.
Language
English
Date Received
20121029
Published
University of Virginia, Department of Computer Science, 2008
Published Date
2008
Collection
Libra Open Repository
Logo for In CopyrightIn Copyright

Availability

Access Online