A scalable on-line multilevel distributed network fault detection/monitoring system based on the SNMP protocol

17Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Traditional centralized network management solutions do not scale to present-day large-scale computer/communication networks. They suffer from certain other drawbacks too: a single point of failure and hence lack of fault tolerance, and heavy communication costs associated with the central manager. It has been recognized that decentralization/distributed solutions can solve some of the problems associated with centralized solutions. For this reason, there has been considerable interest in the recent past on distributed/decentralized network management applications. Our work in this paper has been motivated by this research trend in network management. We present the design and evaluation of an SNMP-based distributed network fault detection/monitoring system. The design involves the integration of our recently developed ML-ADSD algorithm for diagnosis of faults in a distributed system of processors into the SNMP framework. The ML-ADSD algorithm uses the multilevel paradigm and is scalable in the sense that only minor modifications will be required to adapt the algorithm to networks of varying sizes. The system allows processors to fail and/or recover during the process of diagnosis. Thus the system has fault tolerance capability. We demonstrate the application of the system by implementing it on an Ethernet network of 32 machines. Our results establish that the diagnosis latency (or time to termination) is much better than the latency of earlier solutions. Also, the bandwidth utilization of our system is very insignificant, thereby demonstrating the practicality of deployment of the system in a real network environment. Thus in this work we have successfully integrated three modern disciplines: network management, distributed computing and system level diagnosis.

Cite

CITATION STYLE

APA

Su, M. S., Thulasiraman, K., & Das, A. (2002). A scalable on-line multilevel distributed network fault detection/monitoring system based on the SNMP protocol. In Conference Record / IEEE Global Telecommunications Conference (Vol. 2, pp. 1960–1964). https://doi.org/10.1109/glocom.2002.1188542

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free