Automatic Task-based Code Generation for High Performance Domain Specific Embedded Language
Providing high level tools for parallel programming while sustaining a high level of performance has been a challenge that techniques like Domain Specific Embedded Languages try to solve. In previous works, we investigated the design of such a DSEL – NT2 – providing a Matlab -like syntax for parallel numerical computations inside a C++ library.
In this talk, we describe how NT2 has been redesigned for shared memory systems in an extensible and portable way by using a tiered Parallel Skeleton system built using asynchronous task management and automatic compile-time taskification of user level code.
After a short introduction on NT2, we describe what Parallel Skeletons are and how they can help designing sound, composable and portable parallel software; how the transition from C++ DSEL expression to parallel task graph is performed; how the new NT2 system can operate various shared memory runtimes (including OpenMP, Intel TBB and HPX) under these premises. Finally, we''ll evaluate this new design by using several benchmarks implementing linear algebra algorithms adn showing we are on par with or even outperform some state of the ineara algebra libraries.
Speaker: Joel Falcou