Customized AI Models Boost Mobile Intelligence by 12.7% and Reduce Memory Use

As the demand for mobile intelligence continues to grow, researchers have made a significant breakthrough in customizing large language models (LLMs) for individual mobile apps. A novel proxy submodel tuning framework called LiteMoE has been developed to address the challenges of limited on-device resources and restricted control over the foundation LLM.

LiteMoE enables mobile apps to fine-tune customized adapters on devices using proxy submodels, achieving significant improvements in accuracy and memory reduction compared to operating the original foundation LLM. The results are impressive, with mobile apps using LiteMoE showing a 12.7% accuracy improvement and 66% memory reduction.

This innovative framework has the potential to unlock new possibilities for mobile intelligence on devices with limited resources. It would allow researchers to provide restricted control over the foundation LLM and enable mobile apps to customize their services without requiring additional retraining of the entire LLM.

The increasing demand for mobile intelligence has led to deploying large language models (LLMs) on devices. However, current practices face challenges in customizing these models for individual mobile apps due to limited on-device resources and restricted control over the foundation LLM. This issue is addressed by proposing a novel proxy submodel tuning framework called LiteMoE.

LiteMoE enables mobile apps to efficiently fine-tune customized adapters on devices using proxy submodels. The key technique behind LiteMoE is a post-training submodel extraction method that identifies critical experts, matches and merges moderate experts, and extracts a lightweight and effective proxy submodel from the foundation LLM for a certain app. This approach allows mobile apps to customize their services without requiring additional retraining of the entire LLM.

The implementation of LiteMoE has been evaluated over various MoE-based LLMs and mobile computing tasks. The results show that with LiteMoE, mobile apps can achieve significant improvements in accuracy (127%) and memory reduction (66%) compared to operating the original foundation LLM. This demonstrates the potential of LiteMoE in enabling customized LLM serving on resource-limited devices.

The customization of large language models (LLMs) for mobile intelligence is a complex task due to several challenges. Firstly, the limited on-device resources make it difficult to deploy and fine-tune LLMs that require significant computational power and memory. Secondly, mobile apps have restricted control over the foundation LLM, making it challenging to customize their services without affecting the overall performance of the device.

These challenges arise from current practices attempting to deploy a system-level mixture-of-experts (MoE)– based foundation LLM shared by multiple mobile apps on a device. However, this approach does not take into account the unique requirements and data of individual mobile apps, leading to suboptimal performance and inefficient resource use.

LiteMoE addresses the challenges in customizing large language models (LLMs) by proposing a novel proxy submodel tuning framework. The key technique behind LiteMoE is a post-training submodel extraction method that identifies critical experts, matches and merges moderate experts, and extracts a lightweight and effective proxy submodel from the foundation LLM for a certain app.

This approach allows mobile apps to customize their services without requiring additional retraining of the entire LLM. By extracting a proxy submodel that is tailored to the specific needs of each app, LiteMoE enables efficient fine-tuning of customized adapters on devices using proxy submodels.

The use of LiteMoE in customizing large language models (LLMs) offers several key benefits. Firstly, it enables mobile apps to achieve significant improvements in accuracy (127%) and memory reduction (66%) compared to operating the original foundation LLM.

Secondly, LiteMoE allows for efficient fine-tuning of customized adapters on devices using proxy submodels, which reduces the computational power and memory required by the LLM. This makes it possible to deploy and fine-tune LLMs on resource-limited devices, enabling mobile intelligence applications that were previously not feasible.

Thirdly, LiteMoE provides a flexible and scalable solution for customizing LLMs, allowing mobile apps to adapt their services to changing requirements and data without requiring additional retraining of the entire LLM. This makes it an attractive solution for developers who need to customize their LLMs for specific use cases.

The implications of using LiteMoE in customizing large language models (LLMs) are significant. Firstly, it enables mobile intelligence applications that were previously not feasible due to limited on-device resources and restricted control over the foundation LLM.

Secondly, LiteMoE provides a flexible and scalable solution for customizing LLMs. It allows mobile apps to adapt their services to changing requirements and data without requiring additional retraining of the entire LLM, making it an attractive solution for developers who need to customize their LLMs for specific use cases.

Thirdly, the use of LiteMoE in customizing LLMs has the potential to revolutionize the field of mobile intelligence by enabling more efficient and effective use of resources. By extracting proxy submodels that are tailored to the specific needs of each app, LiteMoE enables developers to create customized LLMs that are optimized for performance and efficiency.

There are numerous future research directions in customizing large language models (LLMs). First, there is a need to further investigate LiteMoE’s potential in enabling customized LLM serving on resource-limited devices.

Secondly, researchers should explore ways to improve the accuracy and efficiency of proxy submodel extraction methods, such as using more advanced machine learning techniques or incorporating domain knowledge into the extraction process.

Thirdly, there is a need to develop more scalable and flexible solutions for customizing LLMs that can adapt to changing requirements and data without requiring additional retraining of the entire LLM. This could involve developing new architectures or algorithms that enable efficient fine-tuning of customized adapters on devices using proxy submodels.

Finally, researchers should investigate the potential applications of LiteMoE in other domains beyond mobile intelligence, such as natural language processing, computer vision, and robotics.

Publication details: “LiteMoE: Customizing On-device LLM Serving via Proxy Submodel Tuning”
Publication Date: 2024-11-04
Authors: Yan Zhuang, Zhenzhe Zheng, Fan Wu, Guihai Chen, et al.
Source:
DOI: https://doi.org/10.1145/3666025.3699355

Quantum News

Quantum News

As the Official Quantum Dog (or hound) by role is to dig out the latest nuggets of quantum goodness. There is so much happening right now in the field of technology, whether AI or the march of robots. But Quantum occupies a special space. Quite literally a special space. A Hilbert space infact, haha! Here I try to provide some of the news that might be considered breaking news in the Quantum Computing space.

Latest Posts by Quantum News:

ETH Zurich Develops AI Control for Robodog to Aid Visually Impaired

ETH Zurich Develops AI Control for Robodog to Aid Visually Impaired

January 20, 2026
Fact.MR Projects $1.1 Billion Horticulture Quantum Sensors Market by 2036

Fact.MR Projects $1.1 Billion Horticulture Quantum Sensors Market by 2036

January 20, 2026
D-Wave Completes Acquisition of Quantum Circuits Inc, Making it Now Annealing + Gate

D-Wave Completes Acquisition of Quantum Circuits Inc, Making it Now Annealing + Gate

January 20, 2026