Japanese researchers have developed a large language model, Fugaku-LLM, with enhanced Japanese language capabilities using the supercomputer Fugaku. The team, led by Professor Rio Yokota of Tokyo Institute of Technology and others from Tohoku University, Fujitsu Limited, RIKEN, Nagoya University, CyberAgent Inc., and Kotoba Technologies Inc., used distributed training methods to optimize the model’s performance. Fugaku-LLM, with 13 billion parameters, outperforms other models developed in Japan. It can be used for research and commercial purposes, potentially leading to innovative applications in fields like AI for Science. The source code is available on GitHub, and the model is on Hugging Face.
Fugaku-LLM: A New Large Language Model with Enhanced Japanese Language Capability
A team of researchers from various Japanese institutions and companies have developed a large language model (LLM) with enhanced Japanese language capability, named Fugaku-LLM. The team, led by Professor Rio Yokota of Tokyo Institute of Technology, Associate Professor Keisuke Sakaguchi of Tohoku University, Koichi Shirahata of Fujitsu Limited, Team Leader Mohamed Wahib of RIKEN, Associate Professor Koji Nishiguchi of Nagoya University, Shota Sasaki of CyberAgent, Inc, and Noriyuki Kojima of Kotoba Technologies Inc., utilized the RIKEN supercomputer Fugaku to train the model.
Fugaku-LLM, with its 13 billion parameters, is larger than the 7-billion-parameter models that have been developed widely in Japan. The model has demonstrated enhanced Japanese capabilities, with an average score of 5.5 on the Japanese MT-Bench, the highest performance among open models that are trained using original data produced in Japan. The benchmark performance for humanities and social sciences tasks reached a remarkably high score of 9.18.
Training Large Language Models on Fugaku
The researchers developed distributed training methods to train large language models on Fugaku. They ported the deep learning framework Megatron-DeepSpeed to Fugaku to optimize the performance of Transformers on Fugaku. They accelerated the dense matrix multiplication library for Transformers and optimized communication performance for Fugaku by combining three types of parallelization techniques and accelerated the collective communication library on the Tofu interconnect D.
The researchers also utilized proprietary Japanese data collected by CyberAgent, English data, and other data to train Fugaku-LLM. The source code of Fugaku-LLM is available on GitHub, and the model is available on Hugging Face. Fugaku-LLM can be used for research and commercial purposes as long as users comply with the license.
Role of Each Institution in the Development of Fugaku-LLM
Each institution and company involved in the development of Fugaku-LLM played a specific role. Tokyo Institute of Technology provided general oversight and worked on the parallelization and communication acceleration of large language models. Tohoku University was responsible for the collection of training data and model selection. Fujitsu worked on the acceleration of computation and communication and implemented pre-training and fine-tuning after training. RIKEN worked on the distributed parallelization and communication acceleration of large-scale language models. Nagoya University studied the application methods of Fugaku-LLM to 3D generative AI. CyberAgent provided the training data, and Kotoba Technologies ported the deep learning framework to Fugaku.
Research Outcomes and Future Development
The research team significantly improved the computational performance of training large language models on the supercomputer Fugaku. They succeeded in increasing the computation speed of the matrix multiplication by a factor of 6, and the communication speed by a factor of 3. The knowledge gained from these efforts can be utilized in the design of the next-generation computing infrastructure after Fugaku and will greatly enhance Japan’s future advantage in the field of AI.
The team developed an easy-to-use, open, and secure large language model with 13 billion parameters. Fugaku-LLM was trained from scratch using the team’s own data, so the entire learning process can be understood, which is superior in terms of transparency and safety.
The results from this research are being made public through GitHub and Hugging Face so that other researchers and engineers can use them to develop large language models further. In the future, as more researchers and engineers participate in improving the models and their applications, the efficiency of training will be improved, leading to next-generation innovative research and business applications.
External Link: Click Here For More
