Drexel University Home Pagewww.drexel.edu DREXEL UNIVERSITY LIBRARIES HOMEPAGE >>

iDEA: Drexel E-repository and Archives > Drexel Theses and Dissertations > Drexel Theses and Dissertations > Performance of parallel algorithms on a broadcast-based architecture

Please use this identifier to cite or link to this item: http://hdl.handle.net/1860/254

Title: Performance of parallel algorithms on a broadcast-based architecture
Authors: Narravula, Harsha V.
Keywords: Cache memory;Computer networks;Multiprocessors
Issue Date: 16-Jan-2004
Abstract: Research in high-end computing has produced enormous benefits to society. While new data- and computation-intensive applications are appearing all the time, there is evidence that present scalable parallel architectures may not be well suited for these applications. To achieve petaflops computing, advances in hardware technology, architecture, system software, and programming environments is needed. Due to advances in fiber optics and VLSI technology, interconnection networks, which allow multiple simultaneous broadcasts, are becoming feasible. The Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus) is a low-latency high-bandwidth, fiber-optic network with a unique feature that every processor is directly connected to the other processor through a dedicated broadcast/output channel. This thesis presents the multiprocessor architecture of the SOME-Bus and examines the performance of representative algorithms for matrix operations and sorting using the message-passing and distributed-shared-memory paradigms. It shows that simple enhancements to the network interface and the cache and directory controllers can greatly improve the performance; for example, the communication time of a matrix-vector multiplication algorithm is reduced to O(1) using DSM. Existing parallel loop schemes are extended to make them suitable for the highend system under study. Efficient mapping of existing parallel software to the system is studied. Software is implemented, tested and evaluated for performance on a simulator developed for the system. The thesis also presents enhancements to the network interface and the cache and directory controllers, which allow significant overlap of processing time with the communication time due to compulsory misses. Results from the simulated execution of simple algorithms such as the matrix-matrix multiplication on the SOME-Bus show that block capture and prefetch combined with an effective block replacement policy succeed in significantly reducing the miss rate due to compulsory misses as the cache size increases, while a similar increase of cache size in traditional architectures leaves the miss rate (due to compulsory misses) unaffected.
URI: http://dspace.library.drexel.edu/handle/1860/254
Appears in Collections:Drexel Theses and Dissertations

Files in This Item:

File Description SizeFormat
narravula_thesis.pdf1.13 MBAdobe PDFView/Open
View Statistics

Items in iDEA are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0! iDEA Software Copyright © 2002-2010  Duraspace - Feedback