Item Details

Print View

Practical PRAM Programming

Jörg Keller, Christoph Kessler, Christoph Keβler and Jesper Träff
Format
Book
Published
New York : John Wiley, c2001.
Language
English
Series
Wiley Series on Parallel and Distributed Computing
ISBN
0471353515 (cloth : alk. paper)
Contents
  • 1.1 A Brief Introduction to the PRAM Model 1
  • 1.2 PRAM as a Practical Model of Parallel Computation 4
  • 1.3 First Steps in Fork 5
  • 1.4 A Parallel CRCW Quicksort Implementation 9
  • 1.4.1 CRCW Quicksort Algorithm by Chlebus and Vrto 9
  • 1.4.2 Implementation of the Algorithm in Fork 12
  • 1.4.3 Empirical Evaluation of the Implementation 17
  • 2 PRAM Model of Parallel Computation 23
  • 2.1 RAM Model of Sequential Computation 23
  • 2.2 PRAM Model 28
  • 2.2.1 Computing the Sum on a PRAM 32
  • 2.3 Measures of Efficiency of PRAM Algorithms 35
  • 2.3.1 Self-Simulation and Brent's Theorem 39
  • 2.3.2 Speedup 42
  • 2.3.3 Design Goals for PRAM Algorithms 45
  • 2.3.4 Practical Performance Measures for PRAM Algorithms 47
  • 2.3.5 Scalability of Parallel Algorithms 48
  • 2.3.6 Efficiency and Performance of the Sum Algorithm 49
  • 2.3.7 Efficiency and Self-Simulation 51
  • 2.4 Some PRAM Algorithms 52
  • 2.4.1 Prefix-Sums Problem 52
  • 2.4.2 Finding the Maximum 57
  • 2.4.3 An Operation on Lists 58
  • 2.4.4 Making the PRAM Algorithms More Cost-Efficient 62
  • 2.4.5 An Application of Brent's Theorem 64
  • 2.5 Relative Strength of the PRAM Variants 65
  • 2.5.1 Simulation Results 65
  • 2.5.2 Separation Theorems 68
  • 2.6 Further PRAM Variants and Related Models 69
  • 2.6.1 Even Stronger PRAM Variants 69
  • 2.6.2 PRAM Variants that Measure Communication 70
  • 2.6.3 PRAM Variants that Measure Contention 71
  • 2.6.4 Asynchronous PRAM Variants 72
  • 2.6.5 Bridging Models: BSP and Related Models 74
  • 3 PRAM Emulation 81
  • 3.1 Simulation of the PRAM on Distributed Memory Machines 81
  • 3.2 Simulation of the PRAM 84
  • 3.3 Communication Networks 86
  • 3.4 Distributing Shared Memory by Hashing 88
  • 3.5 Simulating One PRAM Time Step 91
  • 3.6 Analysis of Ranade's Simulation 96
  • 3.6.1 Simulating Several Steps 101
  • 3.6.2 Making the Simulation More Efficient 102
  • 3.7 Deterministic Simulations 102
  • 3.8 Simulation of the PRAM on Optical Networks 104
  • 4 An Emulation of a PRAM in Hardware 107
  • 4.1 Fluent Abstract Machine 108
  • 4.1.1 Ranade's Algorithm Revisited 108
  • 4.1.2 Concurrent Access and Multiprefix 109
  • 4.1.3 Fluent Machine 110
  • 4.2 SB-PRAM Architecture 110
  • 4.2.1 Reengineering the Fluent Machine 110
  • 4.2.2 Choice of the Hash Function 116
  • 4.3 Hardware Realization of the Prototype 118
  • 4.3.1 Processor 119
  • 4.3.2 Sorting Arrays 119
  • 4.3.3 Processor Modules 121
  • 4.3.4 Memory Modules 121
  • 4.3.5 Routing Chips 121
  • 4.3.6 Network Boards 123
  • 4.3.7 Commercial Parts 124
  • 4.4 Instruction Set 125
  • 4.4.1 Compute Instructions 126
  • 4.4.2 Load and Store Instructions 128
  • 4.4.3 Multiprefix Instructions 128
  • 4.4.4 Jump Instructions 130
  • 4.4.5 PRAM Assembler Shorthands 130
  • 4.5 System Software 132
  • 4.5.1 Assembler, Linker, and Loader 133
  • 4.5.2 Simulator 136
  • 4.5.3 Running Assembler Programs 137
  • 4.5.4 Operating System 139
  • 4.6 SB-PRAM Assembler Programming 140
  • 4.6.1 Exact Synchronization by Padding 142
  • 4.6.2 Inexact Barriers 142
  • 4.6.3 Inexact Barrier with Separation 143
  • 4.6.4 Exact Barrier 145
  • 4.6.5 Self-Restoring Exact Barrier 147
  • 4.6.6 Alternatives to Busy Waiting 147
  • 4.7 Other Shared Memory Machines 148
  • 4.7.1 Emulated Shared Memory 148
  • 4.7.2 Distributed Shared Memory 152
  • 4.7.3 Hybrid Architectures 156
  • 4.7.4 Sources of Information 156
  • 5 Fork 159
  • 5.1 Shared and Private Variables 162
  • 5.1.1 Declaration 162
  • 5.1.2 System Variables 163
  • 5.1.3 Concurrent-Write Conflict Resolution 163
  • 5.2 Expressions 164
  • 5.2.1 Atomic Multiprefix Instructions 164
  • 5.2.2 Ilog2 Operator 165
  • 5.2.3 Return Values of Functions 166
  • 5.2.4 Shared and Private Expressions 166
  • 5.3 Synchronous and Asynchronous Regions 166
  • 5.3.1 Advantages of the Synchronous Mode 167
  • 5.3.2 Advantages of the Asynchronous Mode 168
  • 5.3.3 Declaring the Synchronicity of a Function 168
  • 5.3.4 Farm Statement 169
  • 5.3.5 Seq Statement 169
  • 5.3.6 Start Statement 170
  • 5.3.7 Calling Synchronous, Straight, and Asynchronous Functions 171
  • 5.4 Synchronous Execution and the Group Concept 171
  • 5.4.1 Synchronicity Invariant 172
  • 5.4.2 Subgroup Creation and Group Activity Regions 173
  • 5.4.3 Group ID @, Group-Local Processor ID $, and Group Rank $$ 173
  • 5.4.4 Synchronous if Statements with Private Condition 174
  • 5.4.5 Synchronous Loops with Private Exit Condition 178
  • 5.4.6 Calls to Synchronous Functions 180
  • 5.4.7 Fork Statement 180
  • 5.5 Pointers and Heaps 183
  • 5.5.3 Pointers to Functions 185
  • 5.6 Exploring the Asynchronous Mode 186
  • 5.6.1 Barrier Statement 186
  • 5.6.2 Critical Sections, Locks, and Semaphores 186
  • 5.7 Join Statement 197
  • 5.7.1 Simulating the start Statement by join 200
  • 5.7.2 Synchronous Parallel Critical Sections 200
  • 5.8 Programming Style and Caveats 203
  • 5.8.1 Absolute versus Relative Group Identifiers 203
  • 5.8.2 Numbering of Group-Relative Processor Identifiers 204
  • 5.8.3 Accessing Objects Shared by Different Groups 204
  • 5.8.4 Jumps 206
  • 5.8.5 Shared Memory Fragmentation 206
  • 5.8.6 Asm Statement 207
  • 5.8.7 Short-Circuit Evaluation of && and 11 207
  • 5.9 Graphical Trace File Visualization 208
  • 5.10 Other Parallel Programming Languages 212
  • 5.10.1 MIMD Message Passing Environments 213
  • 5.10.2 MIMD Asynchronous Shared Memory Environments 214
  • 5.10.3 Fork-Join-Style Shared Memory MIMD Languages 214
  • 5.10.4 SPMD-Style Asynchronous Shared Memory MIMD Languages 215
  • 5.10.5 SIMD and Dataparallel Programming Languages 216
  • 5.10.6 PRAM Languages 217
  • 5.10.7 History of Fork 218
  • 5.10.8 Future of Fork 220
  • 5.10.9 BSP Languages 221
  • 5.10.10 Integration of Task Parallelism and Data Parallelism 221
  • 5.10.11 Skeleton Languages 222
  • 6 Implementation of Fork 225
  • 6.1 Compilation for the SB-PRAM 225
  • 6.1.1 Extensions to the C Compiler Phases and Data Structures 225
  • 6.1.2 Shared Memory Organization and Frame Layout 226
  • 6.1.3 Translation of start and join 228
  • 6.1.4 Translation of the Private if Statement 228
  • 6.1.5 Groups and Control Flow 231
  • 6.1.6 Accessing Shared Local Variables 231
  • 6.1.7 Runtime Overheads and Optimizations 232
  • 6.2 Implementation of the Fork Standard Library for the SB-PRAM 235
  • 6.2.1 Implementation of Barrier Synchronization 235
  • 6.2.2 Implementation of the Simple Lock 235
  • 6.2.3 Implementation of the Fair Lock 235
  • 6.2.4 Implementation of the Reader-Writer Lock 237
  • 6.2.5 Implementation of the Reader-Writer-Deletor Lock 240
  • 6.3 Compiling Fork for Other Parallel Architectures 240
  • 6.3.1 Emulating Additional PRAM Processors in Software 243
  • 6.3.2 Compiling for Asynchronous Shared Memory Architectures 250
  • 6.3.3 Compiling for Distributed Memory Architectures 254
  • 7 Parallel Algorithmic Paradigms and Programming Techniques 259
  • 7.1 Parallel Loops 262
  • 7.1.1 Dependence Analysis of Sequential Loops 262
  • 7.1.2 Loop Transformations 267
  • 7.1.3 Exploiting Loop Parallelism in Fork 271
  • 7.2 Data Parallelism 281
  • 7.2.1 Independent Data Parallelism 282
  • 7.2.2 Nestable Skeleton Functions 286
  • 7.2.3 Reductions 288
  • 7.2.4 Prefix Computations 291
  • 7.2.5 Matrix Operations 294
  • 7.3 Parallel Divide-and-Conquer 302
  • 7.3.1 Sequential Divide-and-Conquer 302
  • 7.3.2 Parallel Divide-and-Conquer 305
  • 7.3.3 Recursive Parallel Sum Algorithm 307
  • 7.3.4 Strassen's Recursive Matrix-Matrix Multiplication Method 310
  • 7.3.5 Recursive Doubling Method for First-Order Linear Recurrences 314
  • 7.4 Asynchronous Parallel Data Structures 318
  • 7.4.1 A Parallel Rehashable Hash Table 318
  • 7.4.2 A Nonrehashable Parallel Hash Table 324
  • 7.4.3 A Parallel FIFO Queue 324
  • 7.4.4 A Parallel Skip List 330
  • 7.5 Asynchronous Parallel Task Queue 339
  • 7.6 Message Passing 342
  • 7.6.1 MPI Core Routines 347
  • 7.6.2 Implementation of the MPI Core Routines in Fork 350
  • 7.6.3 Selected Collective Communication Routines 352
  • 7.6.4 Data Distribution, Array Alignment, and Collective Communication 354
  • 7.7 Pipelining 356
  • 7.7.1 Synchronous Pipelining 358
  • 7.7.2 Asynchronous Pipelining 363
  • 7.8 Domain Decomposition and Irregular Parallelism 366
  • 7.8.1 N-Body Problem 369
  • 7.8.2 A Simple Force Calculation Algorithm 370
  • 7.8.3 A Hierarchical Force Calculation Algorithm 372
  • 8 A Library of Basic PRAM Algorithms 389
  • 8.2 Arrays 393
  • 8.2.1 Load Balancing for Arrays 394
  • 8.2.2 Prefix-Sums Operations 395
  • 8.2.3 Improving Performance 398
  • 8.2.4 Comparison of the Prefix Operations on the SB-PRAM 398
  • 8.2.5 Other Prefix Computations 401
  • 8.2.6 Range Querying 404
  • 8.3 Merging and Sorting 407
  • 8.3.1 Parallel Search 407
  • 8.3.2 Merging 411
  • 8.3.3 A Mergesort Function 415
  • 8.3.4 Integer Sorting 417
  • 8.4 Lists and List Ranking 419
  • 8.4.1 Simple Operations on Lists 420
  • 8.4.2 List Ranking with Pointer Jumping 423
  • 8.4.3 Improving the Parallel Efficiency of List Ranking 425
  • 8.5 Tree Computations 427
  • 8.5.1 Tree Rooting 430
  • 8.5.2 Lowest Common Ancestors 433
  • 8.5.3 Tree Contraction 437
  • 8.5.4 An Application of Tree Contraction 439
  • 8.5.5 Tree Generators 443
  • 8.6 Fundamental Algorithms on Graphs 443
  • 8.6.1 Connected Components 444
  • 8.6.2 Graph Generators 447
  • -- 8.7 A Parallel Dictionary 447
  • 8.7.1 Searching in a Dictionary 449
  • 8.7.2 Insertion and Deletion 450
  • 8.8 Other Libraries of PRAM Algorithms 454
  • 9 An Example Application 461
  • 9.1 Principles of Fisheye Views 461
  • 9.1.1 Computation of Fisheye Views 462
  • 9.1.2 Extension to Focus Polygons 466
  • 9.2 Implementation 470
  • 9.3 Simple Parallelization 473
  • 9.3.1 Exploiting Loop Parallelism 473
  • 9.3.2 Handling Large Files 476
  • 9.4 Advanced Parallelization 480
  • 9.4.1 Skip-List-Based Dictionary 481
  • 9.4.2 Tree-Based Dictionary 483
  • 9.4.3 Comparison 484
  • 9.5 Generation of Image Files 485
  • 9.6 Extension of FView for Interactive Use 487
  • Appendix A Notation for Asymptotic Growth of Functions 491
  • A.1 "Big Oh" Notation 491
  • Appendix B Online Software and Documentation 495
  • Appendix C Installation Guide 497
  • C.1 Installation of the SB-PRAM System Software 498
  • C.2 Installation of the Fork Compiler 498
  • Appendix D Fork at a Glance 501
  • Appendix E Technical Remarks on the Fork Compiler 507
  • E.1 Fork Compiler Options 507
  • E.2 Organization of the Fork Standard Library 507
  • E.3 Reserved Registers 509.
Description
xxiii, 572 p. : ill. ; 24 cm.
Notes
  • "A Wiley-Interscience publication."
  • Includes bibliographical references (p. 531-561) and index.
Technical Details
  • Access in Virgo Classic
  • Staff View

    LEADER 12432pam a22004094a 4500
    001 u3696837
    003 SIRSI
    005 20010124092356.0
    008 000907s2001 nyua b 001 0 eng
    010
      
      
    a| 00047314
    020
      
      
    a| 0471353515 (cloth : alk. paper)
    035
      
      
    a| (Sirsi) i9780471353515
    035
      
      
    a| (Sirsi) i0471353515
    035
      
      
    a| (OCoLC)44933497
    040
      
      
    a| DLC c| DLC d| NhCcYBP d| MvI
    042
      
      
    a| pcc
    050
    0
    0
    a| QA76.642 b| .K45 2001
    082
    0
    0
    a| 005.2/75 2| 21
    090
      
      
    a| SCIENG/QA76.642 b| .K45 2001
    100
    1
      
    a| Keller, Jörg.
    245
    1
    0
    a| Practical PRAM programming / c| Jörg Keller, Christoph Kessler, Christoph Keβler and Jesper Träff.
    260
      
      
    a| New York : b| John Wiley, c| c2001.
    300
      
      
    a| xxiii, 572 p. : b| ill. ; c| 24 cm.
    440
      
    0
    a| Wiley series on parallel and distributed computing
    500
      
      
    a| "A Wiley-Interscience publication."
    504
      
      
    a| Includes bibliographical references (p. 531-561) and index.
    505
    0
    0
    g| 1.1 t| A Brief Introduction to the PRAM Model g| 1 -- g| 1.2 t| PRAM as a Practical Model of Parallel Computation g| 4 -- g| 1.3 t| First Steps in Fork g| 5 -- g| 1.4 t| A Parallel CRCW Quicksort Implementation g| 9 -- g| 1.4.1 t| CRCW Quicksort Algorithm by Chlebus and Vrto g| 9 -- g| 1.4.2 t| Implementation of the Algorithm in Fork g| 12 -- g| 1.4.3 t| Empirical Evaluation of the Implementation g| 17 -- g| 2 t| PRAM Model of Parallel Computation g| 23 -- g| 2.1 t| RAM Model of Sequential Computation g| 23 -- g| 2.2 t| PRAM Model g| 28 -- g| 2.2.1 t| Computing the Sum on a PRAM g| 32 -- g| 2.3 t| Measures of Efficiency of PRAM Algorithms g| 35 -- g| 2.3.1 t| Self-Simulation and Brent's Theorem g| 39 -- g| 2.3.2 t| Speedup g| 42 -- g| 2.3.3 t| Design Goals for PRAM Algorithms g| 45 -- g| 2.3.4 t| Practical Performance Measures for PRAM Algorithms g| 47 -- g| 2.3.5 t| Scalability of Parallel Algorithms g| 48 -- g| 2.3.6 t| Efficiency and Performance of the Sum Algorithm g| 49 -- g| 2.3.7 t| Efficiency and Self-Simulation g| 51 -- g| 2.4 t| Some PRAM Algorithms g| 52 -- g| 2.4.1 t| Prefix-Sums Problem g| 52 -- g| 2.4.2 t| Finding the Maximum g| 57 -- g| 2.4.3 t| An Operation on Lists g| 58 -- g| 2.4.4 t| Making the PRAM Algorithms More Cost-Efficient g| 62 -- g| 2.4.5 t| An Application of Brent's Theorem g| 64 -- g| 2.5 t| Relative Strength of the PRAM Variants g| 65 -- g| 2.5.1 t| Simulation Results g| 65 -- g| 2.5.2 t| Separation Theorems g| 68 -- g| 2.6 t| Further PRAM Variants and Related Models g| 69 -- g| 2.6.1 t| Even Stronger PRAM Variants g| 69 -- g| 2.6.2 t| PRAM Variants that Measure Communication g| 70 -- g| 2.6.3 t| PRAM Variants that Measure Contention g| 71 -- g| 2.6.4 t| Asynchronous PRAM Variants g| 72 -- g| 2.6.5 t| Bridging Models: BSP and Related Models g| 74 -- g| 3 t| PRAM Emulation g| 81 -- g| 3.1 t| Simulation of the PRAM on Distributed Memory Machines g| 81 -- g| 3.2 t| Simulation of the PRAM g| 84 -- g| 3.3 t| Communication Networks g| 86 -- g| 3.4 t| Distributing Shared Memory by Hashing g| 88 -- g| 3.5 t| Simulating One PRAM Time Step g| 91 -- g| 3.6 t| Analysis of Ranade's Simulation g| 96 -- g| 3.6.1 t| Simulating Several Steps g| 101 -- g| 3.6.2 t| Making the Simulation More Efficient g| 102 -- g| 3.7 t| Deterministic Simulations g| 102 -- g| 3.8 t| Simulation of the PRAM on Optical Networks g| 104 -- g| 4 t| An Emulation of a PRAM in Hardware g| 107 -- g| 4.1 t| Fluent Abstract Machine g| 108 -- g| 4.1.1 t| Ranade's Algorithm Revisited g| 108 -- g| 4.1.2 t| Concurrent Access and Multiprefix g| 109 -- g| 4.1.3 t| Fluent Machine g| 110 -- g| 4.2 t| SB-PRAM Architecture g| 110 -- g| 4.2.1 t| Reengineering the Fluent Machine g| 110 -- g| 4.2.2 t| Choice of the Hash Function g| 116 -- g| 4.3 t| Hardware Realization of the Prototype g| 118 -- g| 4.3.1 t| Processor g| 119 -- g| 4.3.2 t| Sorting Arrays g| 119 -- g| 4.3.3 t| Processor Modules g| 121 -- g| 4.3.4 t| Memory Modules g| 121 -- g| 4.3.5 t| Routing Chips g| 121 -- g| 4.3.6 t| Network Boards g| 123 -- g| 4.3.7 t| Commercial Parts g| 124 -- g| 4.4 t| Instruction Set g| 125 -- g| 4.4.1 t| Compute Instructions g| 126 -- g| 4.4.2 t| Load and Store Instructions g| 128 -- g| 4.4.3 t| Multiprefix Instructions g| 128 -- g| 4.4.4 t| Jump Instructions g| 130 -- g| 4.4.5 t| PRAM Assembler Shorthands g| 130 -- g| 4.5 t| System Software g| 132 -- g| 4.5.1 t| Assembler, Linker, and Loader g| 133 -- g| 4.5.2 t| Simulator g| 136 -- g| 4.5.3 t| Running Assembler Programs g| 137 -- g| 4.5.4 t| Operating System g| 139 -- g| 4.6 t| SB-PRAM Assembler Programming g| 140 -- g| 4.6.1 t| Exact Synchronization by Padding g| 142 -- g| 4.6.2 t| Inexact Barriers g| 142 -- g| 4.6.3 t| Inexact Barrier with Separation g| 143 -- g| 4.6.4 t| Exact Barrier g| 145 -- g| 4.6.5 t| Self-Restoring Exact Barrier g| 147 -- g| 4.6.6 t| Alternatives to Busy Waiting g| 147 -- g| 4.7 t| Other Shared Memory Machines g| 148 -- g| 4.7.1 t| Emulated Shared Memory g| 148 -- g| 4.7.2 t| Distributed Shared Memory g| 152 -- g| 4.7.3 t| Hybrid Architectures g| 156 -- g| 4.7.4 t| Sources of Information g| 156 -- g| 5 t| Fork g| 159 -- g| 5.1 t| Shared and Private Variables g| 162 -- g| 5.1.1 t| Declaration g| 162 -- g| 5.1.2 t| System Variables g| 163 -- g| 5.1.3 t| Concurrent-Write Conflict Resolution g| 163 -- g| 5.2 t| Expressions g| 164 -- g| 5.2.1 t| Atomic Multiprefix Instructions g| 164 -- g| 5.2.2 t| Ilog2 Operator g| 165 -- g| 5.2.3 t| Return Values of Functions g| 166 -- g| 5.2.4 t| Shared and Private Expressions g| 166 -- g| 5.3 t| Synchronous and Asynchronous Regions g| 166 -- g| 5.3.1 t| Advantages of the Synchronous Mode g| 167 -- g| 5.3.2 t| Advantages of the Asynchronous Mode g| 168 -- g| 5.3.3 t| Declaring the Synchronicity of a Function g| 168 -- g| 5.3.4 t| Farm Statement g| 169 -- g| 5.3.5 t| Seq Statement g| 169 -- g| 5.3.6 t| Start Statement g| 170 -- g| 5.3.7 t| Calling Synchronous, Straight, and Asynchronous Functions g| 171 -- g| 5.4 t| Synchronous Execution and the Group Concept g| 171 -- g| 5.4.1 t| Synchronicity Invariant g| 172 -- g| 5.4.2 t| Subgroup Creation and Group Activity Regions g| 173 -- g| 5.4.3 t| Group ID @, Group-Local Processor ID $, and Group Rank $$ g| 173 -- g| 5.4.4 t| Synchronous if Statements with Private Condition g| 174 -- g| 5.4.5 t| Synchronous Loops with Private Exit Condition g| 178 -- g| 5.4.6 t| Calls to Synchronous Functions g| 180 -- g| 5.4.7 t| Fork Statement g| 180 -- g| 5.5 t| Pointers and Heaps g| 183 -- g| 5.5.3 t| Pointers to Functions g| 185 -- g| 5.6 t| Exploring the Asynchronous Mode g| 186 -- g| 5.6.1 t| Barrier Statement g| 186 -- g| 5.6.2 t| Critical Sections, Locks, and Semaphores g| 186 -- g| 5.7 t| Join Statement g| 197 -- g| 5.7.1 t| Simulating the start Statement by join g| 200 -- g| 5.7.2 t| Synchronous Parallel Critical Sections g| 200 -- g| 5.8 t| Programming Style and Caveats g| 203 -- g| 5.8.1 t| Absolute versus Relative Group Identifiers g| 203 -- g| 5.8.2 t| Numbering of Group-Relative Processor Identifiers g| 204 -- g| 5.8.3 t| Accessing Objects Shared by Different Groups g| 204 -- g| 5.8.4 t| Jumps g| 206 -- g| 5.8.5 t| Shared Memory Fragmentation g| 206 -- g| 5.8.6 t| Asm Statement g| 207 -- g| 5.8.7 t| Short-Circuit Evaluation of && and 11 g| 207 -- g| 5.9 t| Graphical Trace File Visualization g| 208 -- g| 5.10 t| Other Parallel Programming Languages g| 212 -- g| 5.10.1 t| MIMD Message Passing Environments g| 213 -- g| 5.10.2 t| MIMD Asynchronous Shared Memory Environments g| 214 -- g| 5.10.3 t| Fork-Join-Style Shared Memory MIMD Languages g| 214 -- g| 5.10.4 t| SPMD-Style Asynchronous Shared Memory MIMD Languages g| 215 -- g| 5.10.5 t| SIMD and Dataparallel Programming Languages g| 216 -- g| 5.10.6 t| PRAM Languages g| 217 -- g| 5.10.7 t| History of Fork g| 218 -- g| 5.10.8 t| Future of Fork g| 220 -- g| 5.10.9 t| BSP Languages g| 221 -- g| 5.10.10 t| Integration of Task Parallelism and Data Parallelism g| 221 -- g| 5.10.11 t| Skeleton Languages g| 222 -- g| 6 t| Implementation of Fork g| 225 -- g| 6.1 t| Compilation for the SB-PRAM g| 225 -- g| 6.1.1 t| Extensions to the C Compiler Phases and Data Structures g| 225 -- g| 6.1.2 t| Shared Memory Organization and Frame Layout g| 226 -- g| 6.1.3 t| Translation of start and join g| 228 -- g| 6.1.4 t| Translation of the Private if Statement g| 228 -- g| 6.1.5 t| Groups and Control Flow g| 231 -- g| 6.1.6 t| Accessing Shared Local Variables g| 231 -- g| 6.1.7 t| Runtime Overheads and Optimizations g| 232 -- g| 6.2 t| Implementation of the Fork Standard Library for the SB-PRAM g| 235 -- g| 6.2.1 t| Implementation of Barrier Synchronization g| 235 -- g| 6.2.2 t| Implementation of the Simple Lock g| 235 -- g| 6.2.3 t| Implementation of the Fair Lock g| 235 -- g| 6.2.4 t| Implementation of the Reader-Writer Lock g| 237 -- g| 6.2.5 t| Implementation of the Reader-Writer-Deletor Lock g| 240 -- g| 6.3 t| Compiling Fork for Other Parallel Architectures g| 240 -- g| 6.3.1 t| Emulating Additional PRAM Processors in Software g| 243 -- g| 6.3.2 t| Compiling for Asynchronous Shared Memory Architectures g| 250 -- g| 6.3.3 t| Compiling for Distributed Memory Architectures g| 254 -- g| 7 t| Parallel Algorithmic Paradigms and Programming Techniques g| 259 -- g| 7.1 t| Parallel Loops g| 262 -- g| 7.1.1 t| Dependence Analysis of Sequential Loops g| 262 -- g| 7.1.2 t| Loop Transformations g| 267 -- g| 7.1.3 t| Exploiting Loop Parallelism in Fork g| 271 -- g| 7.2 t| Data Parallelism g| 281 -- g| 7.2.1 t| Independent Data Parallelism g| 282 -- g| 7.2.2 t| Nestable Skeleton Functions g| 286 -- g| 7.2.3 t| Reductions g| 288 -- g| 7.2.4 t| Prefix Computations g| 291 -- g| 7.2.5 t| Matrix Operations g| 294 -- g| 7.3 t| Parallel Divide-and-Conquer g| 302 -- g| 7.3.1 t| Sequential Divide-and-Conquer g| 302 -- g| 7.3.2 t| Parallel Divide-and-Conquer g| 305 -- g| 7.3.3 t| Recursive Parallel Sum Algorithm g| 307 -- g| 7.3.4 t| Strassen's Recursive Matrix-Matrix Multiplication Method g| 310 -- g| 7.3.5 t| Recursive Doubling Method for First-Order Linear Recurrences g| 314 -- g| 7.4 t| Asynchronous Parallel Data Structures g| 318 -- g| 7.4.1 t| A Parallel Rehashable Hash Table g| 318 -- g| 7.4.2 t| A Nonrehashable Parallel Hash Table g| 324 -- g| 7.4.3 t| A Parallel FIFO Queue g| 324 -- g| 7.4.4 t| A Parallel Skip List g| 330 -- g| 7.5 t| Asynchronous Parallel Task Queue g| 339 -- g| 7.6 t| Message Passing g| 342 -- g| 7.6.1 t| MPI Core Routines g| 347 -- g| 7.6.2 t| Implementation of the MPI Core Routines in Fork g| 350 -- g| 7.6.3 t| Selected Collective Communication Routines g| 352 -- g| 7.6.4 t| Data Distribution, Array Alignment, and Collective Communication g| 354 -- g| 7.7 t| Pipelining g| 356 -- g| 7.7.1 t| Synchronous Pipelining g| 358 -- g| 7.7.2 t| Asynchronous Pipelining g| 363 -- g| 7.8 t| Domain Decomposition and Irregular Parallelism g| 366 -- g| 7.8.1 t| N-Body Problem g| 369 -- g| 7.8.2 t| A Simple Force Calculation Algorithm g| 370 -- g| 7.8.3 t| A Hierarchical Force Calculation Algorithm g| 372 -- g| 8 t| A Library of Basic PRAM Algorithms g| 389 -- g| 8.2 t| Arrays g| 393 -- g| 8.2.1 t| Load Balancing for Arrays g| 394 -- g| 8.2.2 t| Prefix-Sums Operations g| 395 -- g| 8.2.3 t| Improving Performance g| 398 -- g| 8.2.4 t| Comparison of the Prefix Operations on the SB-PRAM g| 398 -- g| 8.2.5 t| Other Prefix Computations g| 401 -- g| 8.2.6 t| Range Querying g| 404 -- g| 8.3 t| Merging and Sorting g| 407 -- g| 8.3.1 t| Parallel Search g| 407 -- g| 8.3.2 t| Merging g| 411 -- g| 8.3.3 t| A Mergesort Function g| 415 -- g| 8.3.4 t| Integer Sorting g| 417 -- g| 8.4 t| Lists and List Ranking g| 419 -- g| 8.4.1 t| Simple Operations on Lists g| 420 -- g| 8.4.2 t| List Ranking with Pointer Jumping g| 423 -- g| 8.4.3 t| Improving the Parallel Efficiency of List Ranking g| 425 -- g| 8.5 t| Tree Computations g| 427 -- g| 8.5.1 t| Tree Rooting g| 430 -- g| 8.5.2 t| Lowest Common Ancestors g| 433 -- g| 8.5.3 t| Tree Contraction g| 437 -- g| 8.5.4 t| An Application of Tree Contraction g| 439 -- g| 8.5.5 t| Tree Generators g| 443 -- g| 8.6 t| Fundamental Algorithms on Graphs g| 443 -- g| 8.6.1 t| Connected Components g| 444 -- g| 8.6.2 t| Graph Generators g| 447 --
    505
    8
    0
    g| 8.7 t| A Parallel Dictionary g| 447 -- g| 8.7.1 t| Searching in a Dictionary g| 449 -- g| 8.7.2 t| Insertion and Deletion g| 450 -- g| 8.8 t| Other Libraries of PRAM Algorithms g| 454 -- g| 9 t| An Example Application g| 461 -- g| 9.1 t| Principles of Fisheye Views g| 461 -- g| 9.1.1 t| Computation of Fisheye Views g| 462 -- g| 9.1.2 t| Extension to Focus Polygons g| 466 -- g| 9.2 t| Implementation g| 470 -- g| 9.3 t| Simple Parallelization g| 473 -- g| 9.3.1 t| Exploiting Loop Parallelism g| 473 -- g| 9.3.2 t| Handling Large Files g| 476 -- g| 9.4 t| Advanced Parallelization g| 480 -- g| 9.4.1 t| Skip-List-Based Dictionary g| 481 -- g| 9.4.2 t| Tree-Based Dictionary g| 483 -- g| 9.4.3 t| Comparison g| 484 -- g| 9.5 t| Generation of Image Files g| 485 -- g| 9.6 t| Extension of FView for Interactive Use g| 487 -- g| Appendix A t| Notation for Asymptotic Growth of Functions g| 491 -- g| A.1 t| "Big Oh" Notation g| 491 -- g| Appendix B t| Online Software and Documentation g| 495 -- g| Appendix C t| Installation Guide g| 497 -- g| C.1 t| Installation of the SB-PRAM System Software g| 498 -- g| C.2 t| Installation of the Fork Compiler g| 498 -- g| Appendix D t| Fork at a Glance g| 501 -- g| Appendix E t| Technical Remarks on the Fork Compiler g| 507 -- g| E.1 t| Fork Compiler Options g| 507 -- g| E.2 t| Organization of the Fork Standard Library g| 507 -- g| E.3 t| Reserved Registers g| 509.
    596
      
      
    a| 5
    650
      
    0
    a| Parallel programming (Computer science)
    650
      
    0
    a| Random access memory.
    700
    1
      
    a| Kessler, Christoph q| (Christoph W.)
    700
    1
      
    a| Träff, Jesper.
    999
      
      
    a| QA76.642 .K45 2001 w| LC i| X004431293 l| STACKS m| SCI-ENG t| BOOK
▾See more
▴See less

Availability

Google Preview

Google Books Preview
Library Location Map Availability Call Number
Brown Science and Engineering Stacks N/A Available