Kudu is just a storage engine. You need a way to get data into it and out. As Cloudera’s default big data processing framework, Spark is the ideal data processing and ingestion tool for Kudu. Not only does Spark provide excellent scalability and performance, Spark SQL and the DataFrame API make it easy to interact with Kudu.
Quinto, B. (2018). High Performance Data Processing with Spark and Kudu. In Next-Generation Big Data (pp. 159–229). Apress. https://doi.org/10.1007/978-1-4842-3147-0_6