A Many-core Architecture for In-Memory Data Processing

A Many-core Architecture for In-Memory Data Processing

Sandeep Agrawal, Sam Idicula, Arun Raghavan, Evangelos Vlachos, Venkat Govindaraju, Venkatanathan Varadarajan, Cagri Balkesen, Georgios Giannikis, Erik Schlanger, Charlie Roth, Nipun Agarwal, Eric Sedlar

29 March 2017

We live in an information age, with data and analytics guiding a large portion of our daily decisions. Data is being generated at a tremendous pace from connected cars, connected homes and connected workplaces, and extracting useful knowledge from this data is a quickly becoming an impractical task. Single-threaded performance has become saturated in the last decade, and there is a growing need for custom solutions to keep pace with these workloads in a scalable and efficient manner. A big portion of the power in analytics workloads involves bringing data to the processing cores, and we aim to optimize that. We present the Database Processing Unit or DPU, a shared memory many-core that is specifically designed for in-memory analytics workloads. The DPU contains a unique Data Movement System (DMS), which provides hardware acceleration for data movement and preprocessing operations. The DPU also provides acceleration for core to core com- munication via a unique hardware RPC mechanism called the Atomic Transaction Engine or ATE. Comparison of a fabricated DPU chip with a variety of state of the art x86 applications shows a performance/Watt advantage of 3x to 16x.


Venue : The 50th Annual IEEE/ACM International Symposium on Microarchitecture