Apache Flume

  • Vohra D
N/ACitations
Citations of this article
85Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Apache Flume is a framework based on streaming data flows for collecting, aggregating, and transferring large quantities of data. Flume is an efficient and reliable distributed service. A unit of data flow in Flume is called an event. The main components in Flume architecture are Flume source, Flume channel, and Flume sink, all of which are hosted by a Flume agent. A Flume source consumes events from an external source such as a log file or a web server. A Flume source stores the events it receives in a passive data store called a Flume channel. Examples of Flume channel types are a JDBC channel, a file channel, and a memory channel. The Flume sink component removes the events from the Flume channel and puts them in an external storage such as HDFS. A Flume sink can also forward events to another Flume source to be processed by another Flume agent. The Flume architecture for a single-hop data flow is shown in Figure 6-1.

Cite

CITATION STYLE

APA

Vohra, D. (2016). Apache Flume. In Practical Hadoop Ecosystem (pp. 287–300). Apress. https://doi.org/10.1007/978-1-4842-2199-0_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free