Scalable hardware architecture for fast gradient boosted tree training

4Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

Abstract

Gradient Boosted Tree is a powerful machine learning method that supports both classification and regression, and is widely used in fields requiring high-precision prediction, particularly for various types of tabular data sets. Owing to the recent increase in data size, the number of attributes, and the demand for frequent model updates, a fast and efficient training is required. FPGA is suitable for acceleration with power efficiency because it can realize a domain specific hardware architecture; however it is necessary to flexibly support many hyper-parameters to adapt to various dataset sizes, dataset properties, and system limitations such as memory capacity and logic capacity. We introduce a fully pipelined hardware implementation of Gradient Boosted Tree training and a design framework that enables a versatile hardware system description with high performance and flexibility to realize highly parameterized machine learning models. Experimental results show that our FPGA implementation achieves a 11- to 33-times faster performance and more than 300-times higher power efficiency than a state-of-the-art GPU accelerated software implementation.

Cite

CITATION STYLE

APA

Sadasue, T., Tanaka, T., Kasahara, R., Darmawan, A., & Isshiki, T. (2021). Scalable hardware architecture for fast gradient boosted tree training. IPSJ Transactions on System LSI Design Methodology, 14, 11–20. https://doi.org/10.2197/IPSJTSLDM.14.11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free