»

FraSPA

In this research, a Framework for Synthesizing Parallel Applications (FraSPA) in a user-guided manner is being developed. The FraSPA would facilitate the synthesis of parallel applications from existing sequential applications and middleware components for multiple-platforms and diverse domains. The framework design is based upon design patterns and generative programming techniques. The main goal of this research is to raise the level of abstraction of the widely used low-level parallel programming approaches. A technique to separate parallel and sequential concerns will be demonstrated through this work. Other contributions will be in the area of design patterns and Domain-Specific Languages (DSLs) for parallel computing. The design patterns, along with the DSLs, will promote code reuse and code correctness. There would be a reduction in code complexity and code maintenance would become easy. The productivity of the end-users will increase. This research can be broadly classified as “Software Engineering for High Performance Computing”.

Background and Motivation

A gradual shift from homogeneous architectures to heterogeneous architectures is being observed in High Performance Computing (HPC). The combination of CPUs, cell processors, field-programmable gate arrays, and graphical processing units is being touted as the next revolution in HPC. However, in order to exploit the powerful combination of these interesting processing elements, there should be adequate software support such that the development of software for these heterogeneous architectures becomes fast and easy in a cost-effective manner. The development environment should be an extensible open-system that should help in increasing the productivity of the end-users without compromising on the performance and accuracy of the applications. An application development environment for homogeneous architectures should be the first step towards materializing the bigger goal of an application development environment for heterogeneous architectures and is the main focus of this research. The computational power of distributed homogeneous architectures can be exploited by using parallel programming techniques and Message Passing Interface (MPI) [1] is the most widely used standard towards this end. The process of writing a parallel application using MPI APIs often begins with a working sequential application. The concurrency in the sequential application is identified, and depending upon the underlying hardware architecture and application characteristics, a data and task distribution scheme is selected. The APIs for communication are then inserted into the sequential application to derive a parallel application from it. The parallel version thus obtained is further optimized as per the machine architecture to obtain maximum efficiency or speedup. Hence, parallelization often becomes a reengineering process that necessitates intrusive changes to the sequential application and entails a great deal of time, cost and effort. This process of intrusive reengineering to develop parallel programs is often difficult due to the complexities associated with it. Some examples of complexities are lack of proper error detection and handling mechanism, race conditions, and the burden on programmers to explicitly map the computational tasks to the processors. The parallelism is explicit in the code making it complex. The MPI layer provides a poor level of abstraction as it deals with explicit buffers and message transfers and therefore exposes data structure details to the programmer. Despite the challenges and complexities associated with it, MPI is the most popular standard for writing parallel applications. Its main advantages are speed and portability. Hence, to effectively exploit the HPC power of low cost distributed memory architectures, parallel programming based on the most widely used parallel programming standard (i.e., MPI) should be made easy. The source code of legacy HPC applications is usually monolithic, difficult to maintain, and reuse [2]. The effort spent in writing a parallel application is often replicated by other programmers who may be working on similar applications. If there are multiple implementations or solutions for an application, as in the case of poly-algorithms [3], it becomes a difficult task to manage all the solutions and keep them consistent in case an update or modification is required. The code clones across the various solutions supplement this maintenance problem. The above mentioned problems related to maintenance, code replication, architecture-specific code optimization, and intrusive reengineering involved in the process of explicit parallelization using MPI APIs, were the main motivating factors behind building a framework for semi-automatically synthesizing parallel applications.

Overview of the Approach

The high-level model of this approach is presented pictorially in Figure 1. As shown in the Figure, the main components of FraSPA are: design patterns [4] and templates, library of generic code constructs, set of DSLs, application-specific code constructs, model weaver [5], and code transformation tools [6].

ccl_image-a9-Example

[[1]] Figure 1. High-level model of the approach

The end-user will specify the architecture for which the code needs to be synthesized through the DSL for architecture-specification. On the basis of the architecture-specification, appropriate design template is chosen from the repository. Generic code constructs (e.g., checkpointing) are also selected from the library on the basis of the specified architecture. The design template and the generic code constructs are woven together using the code transformation tool and the output is a code skeleton with stubs for the application-specific code constructs. A DSL for specifying application-specification will be used to map the application-specific code constructs to the stubs in the design templates. The DSL for parallel computations will be used to get the necessary information from the user about the functions that should be parallelized. The DSL for specifying the parallel computations and architecture-specifications will be used together to recommend a computation pattern (for a function) to the user. A model weaver and a code transformation tool along with the sequential program are then required to combine the various code components to generate a parallel program. The DSL specifications are communicated to the program transformation engine through the transformation rules that are generated in the model weaving and transformation step. A simple example of transformation on the basis of DSL-specifications is shown in Figure 2. The end-user specifies through the DSL that the “for loop” with the initialization statement “i=0” needs to be parallelized. The required MPI APIs for communication and synchronization, along with the other code required to execute the loop in parallel are inserted into the existing source code using the program transformation engine. The transformation rules required for non-intrusively parallelizing the sequential applications have been developed in this research. The set of the DSLs and templates required in this research are under development.

ccl_image-a7-Example2
[[2]] Figure 2. Sample transformation

Contributions and Evaluation Metrics

The main contribution of this dissertation will be an extensible framework for generating parallel code for heterogeneous architectures without involving any invasive reengineering. This work will complement other research efforts associated with generative programming, parallel compilers, and new languages for parallel programming. Other contributions will be:

  • Reduction in the complexities associated with the development of HPC applications through the layer of abstraction on top of MPI.
  • Design patterns, templates, and DSLs for parallel programming.
  • Demonstration of a technique to separate sequential and parallel concerns for improved code maintenance and reuse.
  • Increased user productivity in terms of the decrease in the number of lines of code written manually.

This work will be an ideal example of amalgamation of good software engineering practices and HPC and will demonstrate a methodology for composing HPC applications from reusable components. The DSLs developed in this research will show an approach to raise the level of abstraction of programming languages without compromising on the comprehensibility of the code and its performance. To give a proof-of-concept, the problems will be selected from the following domains: numerical computing, image processing, data mining, and image searching & bioinformatics. The FraSPA will be evaluated for the application portability and execution performance. The synthesized and the manually written code for these test cases will be compared for performance and accuracy. Total run-time, speedup and scaled speedup will be used for the performance analysis. A comparison of the number of lines of code the user has to write for an application using the framework and without using the framework will be done. A comparison between the number of changes made manually and by the framework will also be done. These comparisons will be used to quantify the reduction in workload and the usability of the framework. An estimate of the number of lines of reusable code will be presented for each test case and this estimate will prove the reusability of the code in the framework.

References

[1] Message Passing Interface Forum, MPI2: A Message- Passing Interface Standard, International Journal of Supercomputer Applications and High Performance Computing, Special Issue, 12(1/2), pp. 1-299, 1998.

[2] Carver, J., Post-Workshop report for the Third International Workshop on Software Engineering for High Performance Computing Applications (SEHPC07), ACM Software Engineering Notes. Vol. 32, No. 5,pp. 38-43, 2007.

[3] Skjellum, A., Bangalore, P., Driving Issues in Scalable Libraries: Poly-Algorithms, Data Distribution Independence, Redistribution, Local Storage Schemes, Proceedings of the Seventh (SIAM) Conference on Parallel Processing for Scientific Computing, San Francisco, California, pp. 734-37, 1995.

[4] Mattson, T.G., Sander, B. A., and Massingill, B. L., Patterns for Parallel Programming, Addison-Wesley, Reading, MA, 2004.

[5] Bézivin, J., Jouault, F., Valduriez, P., First Experiments with a ModelWeaver, OOPSLA & GPCE Workshop, 2004.

[6] Czarnecki, K., Eisenecker, U.: Generative Programming: Methods, Tools, and Applications. Addison-Wesley Professional (2000).

People

  • Ritu Arora
  • Puri Bangalore

Publications

  • Ritu Arora (Advisor: Purushotham Bangalore). A Framework for Raising the Level of Abstraction of Explicit Parallelization, accepted in the ICSE 2009 Doctoral Symposium, International Conference on Software Engineering, Vancouver, Canada, May 16-24, 2009.
  • Ritu Arora, and Purushotham Bangalore. FraSPA: A Framework for Synthesizing Parallel Applications, Grace Hopper Celebration of Women in Computing (GHC 2008), Keystone Resort, Colorado, October 1-4, 200