阿里云PAI-灵骏大模型训练工具Pai-Megatron-Patch正式开源

2023-9-15 18:04| 发布者: admin| 查看: 8935| 评论: 0

摘要: 作者：李鹏，王明，施晨，黄俊导读随着深度学习大语言模型的不断发展，其模型结构和量级在快速演化，依托大模型技术的应用更是层出不穷。对于广大开发者来说不仅要考虑如何在复杂多变的场景下有效的将大模型消耗 ...

1）模型格式转换

如果基于 huggingface 格式的模型直接进行奖励模型训练（RM）和强化学习优化（PPO），可以跳过此步骤。如果基于 Megatron 格式的模型，如 PAI-Megatron-Patch 训练好的 SFT 模型，进行 RM 和 PPO 训练，需要使用我们提供的模型转换脚本，先将 Megatron 格式的模型文件转换为 huggingface 格式。

LLaMA2 模型转换：

cd PAI-Megatron-Patch/toolkits/model_checkpoints_convertor/gpt3_llamabash model_convertor.sh \/path/to/Megatron-LM \/path/to/megatron_llama2_ckpt \/path/to/hf_llama2_ckpt \1 \1 \llama-7b \0 \true

复制代码

BLOOM 模型转换：

cd PAI-Megatron-Patch/toolkits/model_checkpoints_convertor/bloombash model_convertor_huggingface_megatron.sh \/path/to/Megatron-LM \/path/to/megatron_bloom_ckpt \/path/to/hf_bloom_ckpt \1 \1 \true

复制代码

2）DeepSpeed-Chat

下载安装开源社区 DeepSpeed-Chat 源代码：

cd PAI-Megatron-Patch/rlhf/deepspeed-chatgit clone https://github.com/microsoft/DeepSpeedExamples.gitcp -f rm_main.py DeepSpeedExamples/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/main.pycp -f utils.py DeepSpeedExamples/applications/DeepSpeed-Chat/training/utils/utils.pycd DeepSpeedExamples/applications/DeepSpeed-Chat/pip install -r requirements.txt

复制代码

基于 LLaMA2 模型训练奖励模型（RM）：

cd training/step2_reward_model_finetuning/ && bash training_scripts/llama2/run_llama2_7b.sh

复制代码

基于 LLaMA2 进行强化学习优化训练（PPO）：

cd training/step3_rlhf_finetuning/ && bash training_scripts/llama2/run_llama2_7b_lora.sh

复制代码

3）trlx

下载安装开源社区 trlx 源代码：

cd PAI-Megatron-Patch/rlhf/trlxgit clone https://github.com/CarperAI/trlx.gitcp trlx_bloom_rlhf.py trlx_bloom_rlhf_test.py trlx/examples/summarize_rlhf/cp train_reward_model_bloom.py reward_model_bloom.py ds_config_bloom.json trlx/examples/summarize_rlhf/reward_model/cp -f ds_config_trlx_gptj_summarize.json trlx/examples/summarize_rlhf/configs/cd trlxpip install -e .

复制代码