Abstract
Emerging generative models can create 3D objects from text prompts. However, deploying these models on mobile devices is challenging due to resource constraints and user demand for real-Time performance. We take a first step towards understanding the bottlenecks by performing a measurement study of three recent text-To-3D generative models (Point-E, Shap-E, and CLIP-Mesh) in terms of their runtime GPU memory usage, latency, and synthesis quality. We investigate the effectiveness of quantization and distillation techniques to overcome these challenges by speeding up inference execution, potentially at the expense of quality. We find that the Shap-E model is promising for mobile deployment, but requires further optimization in its bottleneck diffusion step for real-Time performance, as well as reduced memory usage and load times. Further work is needed on custom optimizations for generative text-To-3D models, including targeting specific metrics at each computation stage, efficient representations of 3D objects, and adaptive network and system support for resource-hungry models.
Author supplied keywords
Cite
CITATION STYLE
Zhang, X., Li, Z., Oymak, S., & Chen, J. (2023). Text-To-3D Generative AI on Mobile Devices: Measurements and Optimizations. In Proceedings of the 2023 Workshop on Emerging Multimedia Systems, EMS 2023 (pp. 8–14). Association for Computing Machinery, Inc. https://doi.org/10.1145/3609395.3610594
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.