.dev_scripts | ||
.github/workflows | ||
asset | ||
docs | ||
examples/pytorch | ||
requirements | ||
scripts | ||
swift | ||
tests | ||
tools | ||
.gitignore | ||
.pre-commit-config_local.yaml | ||
.pre-commit-config.yaml | ||
LICENSE | ||
Makefile | ||
MANIFEST.in | ||
README_CN.md | ||
README.md | ||
requirements.txt | ||
setup.cfg | ||
setup.py |
SWIFT (Scalable lightWeight Infrastructure for Fine-Tuning)
ModelScope Hub
中文 | English
📖 Table of Contents
- Introduction
- News
- LLM Training and Inference
- Installation
- Getting Started
- Learn More
- License
- Contact Us
📝 Introduction
SWIFT (Scalable lightWeight Infrastructure for Fine-Tuning) is an extensible framwork designed to faciliate lightweight model fine-tuning and inference. It integrates implementations for various efficient fine-tuning methods, by embracing approaches that is parameter-efficient, memory-efficient, and time-efficient. SWIFT integrates seamlessly into ModelScope ecosystem and offers the capabilities to finetune various models, with a primary emphasis on LLMs and vision models. Additionally, SWIFT is fully compatible with PEFT, enabling users to leverage the familiar Peft interface to finetune ModelScope models.
Currently supported approches (and counting):
- LoRA: LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS
- QA-LoRA:Quantization-Aware Low-Rank Adaptation of Large Language Models.
- LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
- Adapter: Parameter-Efficient Transfer Learning for NLP
- Prompt Tuning: Visual Prompt Tuning
- Side: Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks
- Res-Tuning: Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone < arXiv | Project Page | Usage >
- ROME: Rank-One Editing of Encoder-Decoder Models
- NEFTune: Noisy Embeddings Improve Instruction Finetuning
- All tuners offered on PEFT
Key features:
- By integrating the ModelScope library, models can be readily obatined via a model-id.
- Tuners provided by SWIFT can be combined together to allow exploration of multiple tuners on a model for best result.
- Support calling
activate_adapter
ordeactivate_adapter
orset_active_adapters
to activate/deactivate tuners. User can inference with one model and multiple tuners in different threads independently. - Support training and inference with scripts/CLI,meanwhile support inference with Web-UI.
- Support model deployment(vllm/chatglm.cpp/xinference),Check Official documentation for details.
Users can check the documentation of SWIFT to get detail tutorials.
🎉 News
- 2023.12.18: Support for VLLM for inference acceleration and deployment. For more details, refer to VLLM Inference Acceleration and Deployment.
- 2023.12.15: Support deepseek, deepseek-coder series: deepseek-7b, deepseek-7b-chat, deepseek-67b, deepseek-67b-chat, openbuddy-deepseek-67b-chat, deepseek-coder-1_3b, deepseek-coder-1_3b-chat, deepseek-coder-6_7b, deepseek-coder-6_7b-chat, deepseek-coder-33b, deepseek-coder-33b-chat.
- 2023.12.13: Support mistral-7b-chat-v2, mixtral-7b-moe, mixtral-7b-moe-chat.
- 2023.12.9: Support the
freeze_parameters
parameter as a compromise between LoRA and full parameter. Corresponding shell scripts can be found at full_freeze_ddp. Supportdisable_tqdm
,lazy_tokenize
,preprocess_num_proc
parameters, for details please refer to Command-Line parameters. - 2023.12.8: Support sus-34b-chat, support yi-6b-200k, yi-34b-200k.
- 2023.12.7: Support Multi-Node DDP training.
- 2023.12.4: Supported models: zephyr-7b-beta-chat, openbuddy-zephyr-7b-chat. Supported datasets: hc3-zh, hc3-en.
- 🔥 2023.12.2: Best Practices for Self-cognition Fine-tuning, 10 minutes for self-cognition fine-tuning for LLM, creating a LLM that is specific to oneself.
- 🔥 2023.11.30: Support for training and inference of the qwen-1_8b, qwen-72b, and qwen-audio model series. The corresponding shell scripts can be viewed at qwen_1_8b_chat, qwen_72b_chat, qwen_audio_chat.
- 🔥 2023.11.29: Support the training and inference for AnimateDiff
- 🔥 2023.11.24: Support for yi-34b-chat, codefuse-codellama-34b-chat: The corresponding shell script can be found in yi_34b_chat, codefuse_codellama_34b_chat.
- 🔥 2023.11.18: Support for tongyi-finance-14b series models: tongyi-finance-14b, tongyi-finance-14b-chat, tongyi-finance-14b-chat-int4. The corresponding shell script can be found in tongyi_finance_14b_chat_int4.
- 2023.11.16: Added support for more models in flash attn: qwen series, qwen-vl series, llama series, openbuddy series, mistral series, yi series, ziya series. Please use the
use_flash_attn
parameter. - 🔥 2023.11.11: NEFTune Supported, Use is with
Swift.prepare_model(model, NEFTuneConfig())
- 🔥 2023.11.11: Support training and inference with CLI, and inference with Web-UI. Check the Run using Swift CLI chapter for details.
- 🔥 2023.11.11: Support model deployment(vllm/chatglm.cpp/xinference),Check Official documentation for details.
- 🔥 2023.11.10: Support for bluelm series models: bluelm-7b, bluelm-7b-chat, bluelm-7b-32k, bluelm-7b-chat-32k. The corresponding shell script can be found in bluelm_7b_chat.
- 🔥 2023.11.08: Support the finetuning of xverse-65b model, scripts can be found at: xverse_65b.
- 🔥 2023.11.07: Support the finetuning of yi-6b, yi-34b model, scripts can be found at: yi_6b, yi_34b.
More
- 🔥 2023.10.30: Support QA-LoRA and LongLoRA to decrease memory usage in training.
- 🔥 2023.10.30: Support ROME(Rank One Model Editing) to add/modify knowledges, training is not needed!
- 2023.10.30: Support for skywork-13b series models: skywork-13b, skywork-13b-chat. The corresponding shell script can be found in skywork_13b.
- 🔥 2023.10.27: Support for chatglm3 series models: chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k. The corresponding shell script can be found in chatglm3_6b.
- 🔥 2023.10.17: Supported int4, int8 models: qwen-7b-chat-int4, qwen-14b-chat-int4, qwen-vl-chat-int4, baichuan2-7b-chat-int4, baichuan2-13b-chat-int4, qwen-7b-chat-int8, qwen-14b-chat-int8.
- 2023.10.15: Supported ziya2-13b model series: ziya2-13b, ziya2-13b-chat.
- 2023.10.12: Supported mistral-7b model series: openbuddy-mistral-7b-chat, mistral-7b, mistral-7b-chat.
- 🔥 2023.10.7: Supported DeepSpeed ZeRO-2, enabling LoRA (not just QLoRA) to run DDP on 2*A10.
- 2023.10.4: Supported datasets in the fields of mathematics, law, SQL, and coding: blossom-math-zh, school-math-zh, text2sql-en, sql-create-context-en, lawyer-llama-zh, tigerbot-law-zh, leetcode-python-en.
- 🔥 2023.9.25: Supported qwen-14b model series: qwen-14b, qwen-14b-chat.
- 2023.9.18: Supported internlm-20b model series: internlm-20b, internlm-20b-chat.
- 2023.9.12: Supported training with MP+DDP to accelerate full-parameter fine-tuning speed.
- 2023.9.5: Supported openbuddy-llama2-70b-chat model.
- 2023.9.3: Supported baichuan2 model series: baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat.
✨ LLM Training and Inference
Simple Usage
- Self-cognitionfine-tuning for large models in 10 minutes, creating a personalized large model, please refer to Best Practices for Self-cognition Fine-tuning.
- Quickly perform inference on LLM and build a Web-UI, see the LLM Inference Documentation.
- Rapidly fine-tune and perform inference on LLM, and build a Web-UI. See the LLM Fine-tuning Documentation.
- Utilize VLLM for inference acceleration and deployment. Please refer to VLLM Inference Acceleration and Deployment for more information.
- View the models and datasets supported by Swift. You can check supported models and datasets.
- Expand and customize models, datasets, and dialogue templates in Swift, see Customization and Expansion.
- Check command-line parameters for fine-tuning and inference, see Command-Line parameters.
- View the training time and training GPU memory comparison under different parameters, you can check Benchmark.
Features
- Supported SFT Methods: lora, qlora, full(full parameter fine-tuning)
- Supported Features: quantization, DDP, model parallelism, gradient checkpointing, pushing to modelscope hub, custom datasets, multimodal and agent SFT, mutli-round chat, ...
- Supported Models: [Detail]
- Multi-Modal:
- qwen-vl series: qwen-vl, qwen-vl-chat, qwen-vl-chat-int4
- qwen-audio series: qwen-audio, qwen-audio-chat
- General:
- qwen series: qwen-1_8b-chat, qwen-1_8b-chat-int4, qwen-1_8b-chat-int8, qwen-7b, qwen-7b-chat, qwen-7b-chat-int4, qwen-7b-chat-int8, qwen-14b, qwen-14b-chat, qwen-14b-chat-int4, qwen-14b-chat-int8, qwen-72b, qwen-72b-chat, qwen-72b-chat-int4, qwen-72b-chat-int8
- chatglm series: chatglm2-6b, chatglm2-6b-32k, chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k
- llama series: llama2-7b, llama2-7b-chat, llama2-13b, llama2-13b-chat, llama2-70b, llama2-70b-chat
- yi series: yi-6b, yi-6b-200k, yi-6b-chat, yi-34b, yi-34b-200k, yi-34b-chat
- deepseek series: deepseek-7b, deepseek-7b-chat, deepseek-67b, deepseek-67b-chat
- openbuddy series: openbuddy-llama2-13b-chat, openbuddy-llama-65b-chat, openbuddy-llama2-70b-chat, openbuddy-mistral-7b-chat, openbuddy-zephyr-7b-chat, openbuddy-deepseek-67b-chat
- mistral series: mistral-7b, mistral-7b-chat, mistral-7b-chat-v2, mixtral-7b-moe, mixtral-7b-moe-chat
- baichuan series: baichuan-7b, baichuan-13b, baichuan-13b-chat, baichuan2-7b, baichuan2-7b-chat, baichuan2-13b, baichuan2-13b-chat, baichuan2-7b-chat-int4, baichuan2-13b-chat-int4
- internlm series: internlm-7b, internlm-7b-chat, internlm-7b-chat-8k, internlm-20b, internlm-20b-chat
- xverse series: xverse-7b, xverse-7b-chat, xverse-13b, xverse-13b-chat, xverse-65b
- bluelm series: bluelm-7b, bluelm-7b-chat, bluelm-7b-32k, bluelm-7b-chat-32k
- zephyr series: zephyr-7b-beta-chat
- ziya series: ziya2-13b, ziya2-13b-chat
- skywork series: skywork-13b, skywork-13b-chat
- sus series: sus-34b-chat
- other: polylm-13b, seqgpt-560m
- Financial:
- tongyi-finance series: tongyi-finance-14b, tongyi-finance-14b-chat, tongyi-finance-14b-chat-int4
- Coding:
- codefuse series: codefuse-codellama-34b-chat
- deepseek-coder series: deepseek-coder-1_3b, deepseek-coder-1_3b-chat, deepseek-coder-6_7b, deepseek-coder-6_7b-chat, deepseek-coder-33b, deepseek-coder-33b-chat
- Multi-Modal:
- Supported Datasets: [Detail]
- NLP:
- General: 🔥alpaca-en(gpt4), 🔥alpaca-zh(gpt4), multi-alpaca-all, instinwild-en, instinwild-zh, cot-en, cot-zh, firefly-all-zh, instruct-en, gpt4all-en, sharegpt-en, sharegpt-zh, tutu-v2-sft-mixture, wikipedia-zh, open-orca, open-orca-gpt4, sharegpt-gpt4
- Agent: damo-agent-zh, 🔥damo-agent-mini-zh, 🔥agent-instruct-all-en
- Coding: code-alpaca-en, 🔥leetcode-python-en, 🔥codefuse-python-en, 🔥codefuse-evol-instruction-zh
- Medical: medical-en, medical-zh, medical-mini-zh
- Law: 🔥lawyer-llama-zh, tigerbot-law-zh
- Math: 🔥blossom-math-zh, school-math-zh, open-platypus-en
- SQL: text2sql-en, 🔥sql-create-context-en
- Text Generation: 🔥advertise-gen-zh, 🔥dureader-robust-zh
- Classification: cmnli-zh, 🔥cmnli-mini-zh, 🔥jd-sentiment-zh, 🔥hc3-zh, 🔥hc3-en
- Other: finance-en, poetry-zh, webnovel-zh, generated-chat-zh, cls-fudan-news-zh, ner-jave-zh
- Multi-Modal:
- Vision: coco-en, 🔥coco-mini-en
- Audio: aishell1-zh, 🔥aishell1-mini-zh
- Custom Dataset
- NLP:
- Supported Templates:
- Text Generation: default-generation, default-generation-bos, chatglm-generation
- Chat: default, chatml, baichuan, chatglm2, chatglm3, llama, openbuddy, internlm, yi, xverse, ziya, skywork, bluelm, zephyr, sus, deepseek
🛠️ Installation
SWIFT is running in Python environment. Please make sure your python version is higher than 3.8.
- Install SWIFT by the
pip
command:
# full ability
pip install ms-swift[all] -U
# only use llm
pip install ms-swift[llm] -U
# only use aigc
pip install ms-swift[aigc] -U
# only use adapters
pip install ms-swift -U
- Install SWIFT by source code(for running sft/infer examples), please run:
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e .[llm]
SWIFT requires torch>=1.13.
- Use SWIFT in our docker image:
docker pull registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.8.0-py38-torch2.0.1-tf2.13.0-1.9.1
🚀 Getting Started
SWIFT supports multiple tuners, as well as tuners provided by PEFT. To use these tuners, simply call:
from swift import Swift, LoRAConfig
config = LoRAConfig(...)
model = Swift.prepare_model(model, config, extra_state_keys=['...'])
The code snippet above initialized the tuner randomly. The input model is an instance of torch.nn.Module
, the config is a subclass instance of SwiftConfig
or PeftConfig
. extra_state_keys is
the extra module weights(like the linear head) to be trained and stored in the output dir.
You may combine multiple tuners by:
from swift import Swift, LoRAConfig, PromptConfig
model = Swift.prepare_model(model, {'lora': LoRAConfig(...), 'prompt': PromptConfig(...)})
Call save_pretrained
and push_to_hub
after finetuning:
from swift import push_to_hub
model.save_pretrained('some-output-folder')
push_to_hub('my-group/some-repo-id-modelscope', 'some-output-folder', token='some-ms-token')
Assume my-group/some-repo-id-modelscope
is the model-id in the hub, and some-ms-token
is the token for uploading.
Using the model-id to do later inference:
from swift import Swift
model = Swift.from_pretrained(model, 'my-group/some-repo-id-modelscope')
Here shows a runnable example:
import os
import tempfile
# Please install modelscope by `pip install modelscope`
from modelscope import Model
from swift import LoRAConfig, SwiftModel, Swift, push_to_hub
tmp_dir = tempfile.TemporaryDirectory().name
if not os.path.exists(tmp_dir):
os.makedirs(tmp_dir)
model = Model.from_pretrained('modelscope/Llama-2-7b-ms', device_map='auto')
lora_config = LoRAConfig(target_modules=['q_proj', 'k_proj', 'v_proj'])
model: SwiftModel = Swift.prepare_model(model, lora_config)
# Do some finetuning here
model.save_pretrained(tmp_dir)
push_to_hub('my-group/swift_llama2', output_dir=tmp_dir)
model = Model.from_pretrained('modelscope/Llama-2-7b-ms', device_map='auto')
model = SwiftModel.from_pretrained(model, 'my-group/swift_llama2', device_map='auto')
This is a example that uses transformers for model creation uses SWIFT for efficient tuning.
from swift import Swift, LoRAConfig, AdapterConfig, PromptConfig
from transformers import AutoModelForImageClassification
# init vit model
model = AutoModelForImageClassification.from_pretrained("google/vit-base-patch16-224")
# init lora tuner config
lora_config = LoRAConfig(
r=10, # the rank of the LoRA module
target_modules=['query', 'key', 'value'], # the modules to be replaced with the end of the module name
merge_weights=False # whether to merge weights
)
# init adapter tuner config
adapter_config = AdapterConfig(
dim=768, # the dimension of the hidden states
hidden_pos=0, # the position of the hidden state to passed into the adapter
target_modules=r'.*attention.output.dense$', # the modules to be replaced with regular expression
adapter_length=10 # the length of the adapter length
)
# init prompt tuner config
prompt_config = PromptConfig(
dim=768, # the dimension of the hidden states
target_modules=r'.*layer\.\d+$', # the modules to be replaced with regular expression
embedding_pos=0, # the position of the embedding tensor
prompt_length=10, # the length of the prompt tokens
attach_front=False # Whether prompt is attached in front of the embedding
)
# create model with swift. In practice, you can use any of these tuners or a combination of them.
model = Swift.prepare_model(model, {"lora_tuner": lora_config, "adapter_tuner": adapter_config, "prompt_tuner": prompt_config})
# get the trainable parameters of model
model.get_trainable_parameters()
# 'trainable params: 838,776 || all params: 87,406,432 || trainable%: 0.9596273189597764'
You can use the features offered by Peft in SWIFT:
from swift import LoraConfig, Swift
from peft import TaskType
lora_config = LoraConfig(target_modules=['query', 'key', 'value'], task_type=TaskType.CAUSAL_LM)
model_wrapped = Swift.prepare_model(model, lora_config)
# or call from_pretrained to load weights in the modelhub
model_wrapped = Swift.from_pretrained(model, 'some-id-in-the-modelscope-modelhub')
The saving strategy between Swift tuners and Peft tuners are slightly different. You can name a tuner by:
model = Swift.prepare_model(model, {'default': LoRAConfig(...)})
model.save_pretrained('./output')
In the output dir, you will have a dir structure like this:
output
|-- default
|-- adapter_config.json
|-- adapter_model.bin
|-- adapter_config.json
|-- adapter_model.bin
The config/weights stored in the output dir is the config of extra_state_keys
and the weights of it. This is different from PEFT, which stores the weights and config of the default
tuner.
🔍 Learn More
-
ModelScope Library is the model library of ModelScope project, which contains a large number of popular models.
License
This project is licensed under the Apache License (Version 2.0).
☎ Contact Us
You can contact and communicate with us by joining our WeChat Group: