As the demand for mobile intelligence continues to grow, researchers have made a significant breakthrough in customizing large language models (LLMs) for individual mobile apps. A novel proxy submodel tuning framework called LiteMoE has been developed to address the challenges of limited on-device resources and restricted control over the foundation LLM.
LiteMoE enables mobile apps to fine-tune customized adapters on devices using proxy submodels, achieving significant improvements in accuracy and memory reduction compared to operating the original foundation LLM. The results are impressive, with mobile apps using LiteMoE showing a 12.7% accuracy improvement and 66% memory reduction.
This innovative framework has the potential to unlock new possibilities for mobile intelligence on devices with limited resources. It would allow researchers to provide restricted control over the foundation LLM and enable mobile apps to customize their services without requiring additional retraining of the entire LLM.
The increasing demand for mobile intelligence has led to deploying large language models (LLMs) on devices. However, current practices face challenges in customizing these models for individual mobile apps due to limited on-device resources and restricted control over the foundation LLM. This issue is addressed by proposing a novel proxy submodel tuning framework called LiteMoE.
LiteMoE enables mobile apps to efficiently fine-tune customized adapters on devices using proxy submodels. The key technique behind LiteMoE is a post-training submodel extraction method that identifies critical experts, matches and merges moderate experts, and extracts a lightweight and effective proxy submodel from the foundation LLM for a certain app. This approach allows mobile apps to customize their services without requiring additional retraining of the entire LLM.
The implementation of LiteMoE has been evaluated over various MoE-based LLMs and mobile computing tasks. The results show that with LiteMoE, mobile apps can achieve significant improvements in accuracy (127%) and memory reduction (66%) compared to operating the original foundation LLM. This demonstrates the potential of LiteMoE in enabling customized LLM serving on resource-limited devices.
The customization of large language models (LLMs) for mobile intelligence is a complex task due to several challenges. Firstly, the limited on-device resources make it difficult to deploy and fine-tune LLMs that require significant computational power and memory. Secondly, mobile apps have restricted control over the foundation LLM, making it challenging to customize their services without affecting the overall performance of the device.
These challenges arise from current practices attempting to deploy a system-level mixture-of-experts (MoE)– based foundation LLM shared by multiple mobile apps on a device. However, this approach does not take into account the unique requirements and data of individual mobile apps, leading to suboptimal performance and inefficient resource use.
LiteMoE addresses the challenges in customizing large language models (LLMs) by proposing a novel proxy submodel tuning framework. The key technique behind LiteMoE is a post-training submodel extraction method that identifies critical experts, matches and merges moderate experts, and extracts a lightweight and effective proxy submodel from the foundation LLM for a certain app.
This approach allows mobile apps to customize their services without requiring additional retraining of the entire LLM. By extracting a proxy submodel that is tailored to the specific needs of each app, LiteMoE enables efficient fine-tuning of customized adapters on devices using proxy submodels.
The use of LiteMoE in customizing large language models (LLMs) offers several key benefits. Firstly, it enables mobile apps to achieve significant improvements in accuracy (127%) and memory reduction (66%) compared to operating the original foundation LLM.
Secondly, LiteMoE allows for efficient fine-tuning of customized adapters on devices using proxy submodels, which reduces the computational power and memory required by the LLM. This makes it possible to deploy and fine-tune LLMs on resource-limited devices, enabling mobile intelligence applications that were previously not feasible.
Thirdly, LiteMoE provides a flexible and scalable solution for customizing LLMs, allowing mobile apps to adapt their services to changing requirements and data without requiring additional retraining of the entire LLM. This makes it an attractive solution for developers who need to customize their LLMs for specific use cases.
The implications of using LiteMoE in customizing large language models (LLMs) are significant. Firstly, it enables mobile intelligence applications that were previously not feasible due to limited on-device resources and restricted control over the foundation LLM.
Secondly, LiteMoE provides a flexible and scalable solution for customizing LLMs. It allows mobile apps to adapt their services to changing requirements and data without requiring additional retraining of the entire LLM, making it an attractive solution for developers who need to customize their LLMs for specific use cases.
Thirdly, the use of LiteMoE in customizing LLMs has the potential to revolutionize the field of mobile intelligence by enabling more efficient and effective use of resources. By extracting proxy submodels that are tailored to the specific needs of each app, LiteMoE enables developers to create customized LLMs that are optimized for performance and efficiency.
There are numerous future research directions in customizing large language models (LLMs). First, there is a need to further investigate LiteMoE’s potential in enabling customized LLM serving on resource-limited devices.
Secondly, researchers should explore ways to improve the accuracy and efficiency of proxy submodel extraction methods, such as using more advanced machine learning techniques or incorporating domain knowledge into the extraction process.
Thirdly, there is a need to develop more scalable and flexible solutions for customizing LLMs that can adapt to changing requirements and data without requiring additional retraining of the entire LLM. This could involve developing new architectures or algorithms that enable efficient fine-tuning of customized adapters on devices using proxy submodels.
Finally, researchers should investigate the potential applications of LiteMoE in other domains beyond mobile intelligence, such as natural language processing, computer vision, and robotics.
Publication details: “LiteMoE: Customizing On-device LLM Serving via Proxy Submodel Tuning”
Publication Date: 2024-11-04
Authors: Yan Zhuang, Zhenzhe Zheng, Fan Wu, Guihai Chen, et al.
Source:
DOI: https://doi.org/10.1145/3666025.3699355
