Efficient parallel implementations of controlled optimization of traffic phases

3Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Finding optimal phase durations for a controlled intersection is a computationally intensive task requiring O(N3) operations. In this paper we introduce cost-optimal parallelization of a dynamic programming algorithm that reduces the complexity to O(N2). Three implementations that span a wide range of parallel hardware are developed. The first is based on shared-memory architecture, using the OpenMP programming model. The second implementation is based on message passing, targeting massively parallel machines including high performance clusters, and supercomputers. The third implementation is based on the data parallel programming model mapped on Graphics Processing Units (GPUs). Key optimizations include loop reversal, communication pruning, load-balancing, and efficient thread to processors assignment. Experiments have been conducted on 8-core server, IBM BlueGene/L supercomputer 2-node boards with 128 processors, and GPU GTX470 GeForce Nvidia with 448 cores. Results indicate practical scalability on all platforms, with maximum speed up reaching 76x for the GTX470. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Samra, S., El-Mahdy, A., Gomaa, W., Wada, Y., & Shoukry, A. (2011). Efficient parallel implementations of controlled optimization of traffic phases. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7016 LNCS, pp. 270–281). https://doi.org/10.1007/978-3-642-24650-0_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free