Abstract
Large language models (LLMs) are quickly implemented to answer question and support systems to automate customer experience across all domains, including medical use cases. Models in such environments should solve multiple problems like general knowledge questions, queries to external sources, function calling and many others. Some cases might not even require a full-on text generation. They possibly need different prompts or even different models. All of it can be managed by a routing step. This paper focuses on interpretable few-shot approaches for conversation routing like latent embeddings retrieval. The work here presents a benchmark, a sorrow analysis, and a set of visualizations of the way latent embeddings routing works for long-context conversations in a multilingual, domain-specific environment. The results presented here show that the latent embeddings router is able to achieve performance on the same level as LLM-based routers with additional interpretability and higher level of control over model decision-making.
Author supplied keywords
Cite
CITATION STYLE
Maksymenko, D., & Turuta, O. (2024). Interpretable Conversation Routing via the Latent Embeddings Approach. Computation, 12(12). https://doi.org/10.3390/computation12120237
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.