本站支持尊重有效期内的版权/著作权,所有的资源均来自于互联网网友分享或网盘资源,一旦发现资源涉及侵权,将立即删除。希望所有用户一同监督并反馈问题,如有侵权请联系站长或发送邮件到ebook666@outlook.com,本站将立马改正
Wen-Mei W.Hwu是伊利诺伊大学厄巴纳–香槟分校电气与计算机工程系的Sanders-AMD讲席教授。他的研究兴趣是并行计算的体系结构、实现、编译和算法领域。他是并行计算研究中心的首席科学家,IMPACT研究小组的负责人。他是MulticoreWare公司的联合创始人兼CTO。在研究和教学方面,他获得了ACM SigArch Maurice Wilkes奖、ACM Grace Murray Hopper奖、Tau Beta Pi Daniel C.Drucker杰出学者奖、ISCA影响力论文奖、IEEE计算机协会B.R.Rau奖以及加州大学伯克利分校计算机科学杰出校友奖。他是IEEE和ACM的会士。他主持UIUC CUDA 中心的工作,并且是NSF Blue Waters Petascale计算机项目的主要研究人员之一。Hwu博士在加州大学伯克利分校获得计算机科学博士学位。
Preface
Acknowledgements
CHAPTER.1 Introduction
1.1 Heterogeneous Parallel Computing
1.2 Architecture of a Modern GPU
1.3 Why More Speed or Parallelism
1.4 Speeding Up Real Applications
1.5 Challenges in Parallel Programming
1.6 Parallel Programming Languages and Models
1.7 Overarching Goals
1.8 Organization of the Book
References
CHAPTER.2 Data Parallel Computing
2.1 Data Parallelism
2.2 CUDA C Program Structure
2.3 A Vector Addition Kernel
2.4 Device Global Memory and Data Transfer
2.5 Kernel Functions and Threading
2.6 Kernel Launch
2.7 Summary
Function Declarations
Kernel Launch
Built-in (Predefined) Variables
Run-time API
2.8 Exercises
References
CHAPTER.3 Scalable Parallel Execution
3.1 CUDA Thread Organization
3.2 Mapping Threads to Multidimensional Data
3.3 Image Blur: A More Complex Kernel
3.4 Synchronization and Transparent Scalability
3.5 Resource Assignment
3.6 Querying Device Properties
3.7 Thread Scheduling and Latency Tolerance
3.8 Summary
3.9 Exercises
CHAPTER.4 Memory and Data Locality
4.1 Importance of Memory Access Efficiency
4.2 Matrix Multiplication
4.3 CUDA Memory Types
4.4 Tiling for Reduced Memory Traffic
4.5 A Tiled Matrix Multiplication Kernel
4.6 Boundary Checks
4.7 Memory as a Limiting Factor to Parallelism
4.8 Summary
4.9 Exercises
……
CHAPTER 17 Parallel Programming and ComputationalThinking
17.1 Goals of Parallel Computing
17.2 Problem Decomposition