Preface xv C H A P T E R S 1 Computer Abstractions and Technology □ 1.1 Introduction 3 1.□ Seven Great Ideas in Computer Architecture 10 1.3 Below Your Program 13 1.4 Under the Covers 16 1.5 Technologies for Building Processors and Memory □4 1.6 Performance □8 1.7 Th e Power Wall 40 1.8 Th e Sea Change: Th e Switch from Uniprocessors to Multiprocessors 43 1.9 Real Stuff : Benchmarking the Intel Core i7 46 1.10 Going Faster: Matrix Multiply in Python 49 1.11 Fallacies and Pitfalls 50 1.1□ Concluding Remarks 53 1.13 Historical Perspective and Further Reading 55 1.14 Self-Study 55 1.15 Exercises 59 □ Instructions: Language of the Computer 66 □.1 Introduction 68 □.□ Operations of the Computer Hardware 69 □.3 Operands of the Computer Hardware 7□ □.4 Signed and Unsigned Numbers 79 □.5 Representing Instructions in the Computer 86 □.6 Logical Operations 93 □.7 Instructions for Making Decisions 96 □.8 Supporting Procedures in Computer Hardware 10□ □.9 Communicating with People 11□ □.10 MIPS Addressing for 3□-Bit Immediates and Addresses 118 □.11 Parallelism and Instructions: Synchronization 1□7 □.1□ Translating and Starting a Program 1□9 □.13 A C Sort Example to Put It All Together 138 □.14 Arrays versus Pointers 147 □.15 Advanced Material: Compiling C and Interpreting Java 151 □.16 Real Stuff : ARMv7 (3□-bit) Instructions 151 □.17 Real Stuff : ARMv8 (64-bit) Instructions 155 □.18 Real Stuff : RISC-V Instructions 156 □.19 Real Stuff : x86 Instructions 157 □.□0 Going Faster: Matrix Multiply in C 166 □.□1 Fallacies and Pitfalls 167 □.□□ Concluding Remarks 169 □.□3 Historical Perspective and Further Reading 17□ □.□4 Self Study 17□ □.□5 Exercises 175 3 Arithmetic for Computers 186 3.1 Introduction 188 3.□ Addition and Subtraction 188 3.3 Multiplication 193 3.4 Division 199 3.5 Floating Point □06 3.6 Parallelism and Computer Arithmetic: Subword Parallelism □3□ 3.7 Real Stuff : Streaming SIMD Extensions and Advanced Vector Extensions in x86 □34 3.8 Going Faster: Subword Parallelism and Matrix Multiply □35 3.9 Fallacies and Pitfalls □37 3.10 Concluding Remarks □41 3.11 Historical Perspective and Further Reading □45 3.1□ Self Study □45 3.13 Exercises □48 4 The Processor □54 4.1 Introduction □56 4.□ Logic Design Conventions □60 4.3 Building a Datapath □63 4.4 A Simple Implementation Scheme □71 4.5 A Multicycle Implementation □84 4.6 An Overview of Pipelining □85 4.7 Pipelined Datapath and Control □98 4.8 Data Hazards: Forwarding versus Stalling 315 4.9 Control Hazards 3□8 4.10 Exceptions 337 4.11 Parallelism via Instructions 344 4.1□ Putting It All Together: Th e Intel Core i7 6700 and ARM Cortex-A53 358 4.13 Going Faster: Instruction-Level Parallelism and Matrix Multiply 366 4.14 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 368 4.15 Fallacies and Pitfalls 369 4.16 Concluding Remarks 370 4.17 Historical Perspective and Further Reading 371 4.18 Self-Study 371 4.19 Exercises 37□ 5 Large and Fast: Exploiting Memory Hierarchy 390 5.1 Introduction 39□ 5.□ Memory Technologies 396 5.3 Th e Basics of Caches 401 5.4 Measuring and Improving Cache Performance 416 5.5 Dependable Memory Hierarchy 436 5.6 Virtual Machines 44□ 5.7 Virtual Memory 446 5.8 A Common Framework for Memory Hierarchy 47□ 5.9 Using a Finite-State Machine to Control a Simple Cache 479 5.10 Parallelism and Memory Hierarchies: Cache Coherence 484 5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 488 5.1□ Advanced Material: Implementing Cache Controllers 488 5.13 Real Stuff : Th e ARM Cortex-A8 an......