Apache Flume is a framework based on streaming data flows for collecting, aggregating, and transferring large quantities of data. Flume is an efficient and reliable distributed service. A unit of data flow in Flume is called an event. The main components in Flume architecture are Flume source, Flume channel, and Flume sink, all of which are hosted by a Flume agent. A Flume source consumes events from an external source such as a log file or a web server. A Flume source stores the events it receives in a passive data store called a Flume channel. Examples of Flume channel types are a JDBC channel, a file channel, and a memory channel. The Flume sink component removes the events from the Flume channel and puts them in an external storage such as HDFS. A Flume sink can also forward events to another Flume source to be processed by another Flume agent. The Flume architecture for a single-hop data flow is shown in Figure 6-1.
CITATION STYLE
Vohra, D. (2016). Apache Flume. In Practical Hadoop Ecosystem (pp. 287–300). Apress. https://doi.org/10.1007/978-1-4842-2199-0_6
Mendeley helps you to discover research relevant for your work.