【技術嘗試 Try Technique】嘗試一個微調金融領域的 LLM - FinGPT

發表時間

September 10, 2023

分類

技術嘗試 Try Technique

標籤

LLMsChatGLM2Llama2Sentimemt AnalysisPytorchTransformersHugging Face金融資料 Financial Data

嘗試一個微調金融領域的 LLM：FinGPT （一個追求開源、輕量、低成本的微調在金融領域的 LLM（大型語言模型）），來做新聞標題情感（Sentiment）極性辨識。 PS. 有附註 Hugging Face 模型下載與 cache 路徑自定義說明

\text {Photo from FinGPT}

因為研究題目，最近在接觸各種金融領域微調（fine-tuning）的 LLM，其中一個比較新的是今天的主角 - FinGPT。模型在 Hugging Face 上也找的到。這篇文會稍微簡介一下 FinGPT，然後記錄安裝與使用 3.1 版本的流程，包括官方範例，還有我自己的嘗試。

PS. 這篇不是特別初級向，當然你可以跟著做來玩玩看，但這篇沒有解釋例如斷詞、Transformers、LoRA、合併模型、Sentiment 辨識等等 NLP、LLMs 的相關概念。

‣

本文目錄

FinGPT 簡介

FinGPT 大概有幾個中心思想：

數據為中心（Data-Centric）
開源（Open-Source）
輕量（Lightweight）⇒ 以便達到低成本

Data Sources

其中開源了什麼？開源了一些 Data Sources 跟 Models。

不過 Data Sources 的部分，有些看的出來其實要外接例如 Finnhub 的 API token，代表如果你的 Finnhub API 權限不足（免費版有限數量與數據類型），其實這方面數據量會比較少。不過 Finnhub 我個人覺得已經算佛心的限制了，加上 FinGPT（或者他們也開了另一個專案叫 FinNLP）收集到的數據也不少。

另外因為是中國人主導的 Project，所以也會有中文資料（當然是簡體中文）。

詳細的 Data Sources 可以上他們介紹頁查看（也有簡體中文版）。

Models 與版本

FinGPT 不只使用一個 LLM 來做 fine-tuning。

FinGPT-v1：東方財富數據＋finetuning ChatGLM2 with LoRA
FinGPT-v2：Financial Modeling Prep 數據（需要申請 API）＋finetuning Llama2 with LoRA
FinGPT-v3：FinGPT v3 series are LLMs finetuned with LoRA method on the News and Tweets sentiment analysis dataset which achieve best scores on most of the financial sentiment analysis datasets.

FinGPT-v3.1：chatglm2-6B as base model
FinGPT-v3.2：llama2-7b as base model
FinGPT-v3.3：看詳細的 Benchmark Results 已經有 3.3 用 OpenAI 的版本，但還沒有特別介紹。

本地運行 v3.1 ChatGLM2-6B 版

一樣先說我的環境給大家參考：

OS: Window 10
RAM: 64GB, 3200 MHz
CPU: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
GPU: NVIDIA GeForce RTX 3080 Laptop GPU
Virtual Python Environment: 3.10.12
CUDA: 11.8
Pytorch: Stable (2.0.1)

其實我自己也有跑 LLaMA2 的版本，但比較麻煩一點點，這裡先給大家介紹 ChatGLM2-6B（一個開源雙語 LLM）的版本。

Step 1: 確保你的 CUDA

先確保你安裝正確對應 Pytorch 版本的 CUDA。

CUDA 安裝教學網路上已經很多了，我就不過多贅述。這裡僅簡單條列：

安裝相應版本 CUDA：

CUDA Toolkit 11.8 Downloads

Resources CUDA Documentation/Release NotesMacOS Tools Training Sample Code Forums Archive of Previous CUDA Releases FAQ Open Source PackagesSubmit a BugTarball and Zip Archive Deliverables

developer.nvidia.com

安裝相應 Cudnn（要登入 Nvidia）

NVIDIA cuDNN Archive

Explore and download past releases from cuDNN GPU-accelerated primitive library for deep neural networks for your development work.

developer.nvidia.com

如果 nvcc -V 沒有成功看到 CUDA 資訊，記得檢查環境變數

Step 2: 安裝 Pytorch

請參考：

PyTorch

An open source machine learning framework that accelerates the path from research prototyping to production deployment.

pytorch.org

裝好 Pytorch 記得先測試能不能抓到 GPU：

import torch
torch.cuda.is_available()

如果有 True 就行了。

Step 3: 安裝 Transformers 及相關套件

裝好 Pytorch 之後再 pip install 'transformers[flax]' 安裝 Transformers 比較好。

中途有遇到提醒套件缺失就補安裝下。

另外要記得 pip install sentencepiece（Unsupervised text tokenizer for Neural Network-based text generation by Google）

以及 pip install peft （Parameter-Efficient Fine-Tuning，PEFT）

Installation

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Embed GitHub

Step 4: FinGPT 3.1 範例踩坑＆附註 Hugging Face 模型下載與 cache 路徑自定義說明

FinGPT 給的 v3 範例程式碼可以在下面兩個地方看到：

github.com

oliverwang15/FinGPT_ChatGLM2_Sentiment_Instruction_LoRA_FT · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

oliverwang15/FinGPT_ChatGLM2_Sentiment_Instruction_LoRA_FT · Hugging Face

雖然也是基於 ChatGLM2，不過這個後來他們有出 v3.1，效果比 v3 好。只是範例裡面的例子我個人跑起來會跟 v3 的答案不一樣，這是正常的，畢竟版本不同：D

oliverwang15/FinGPT_v31_ChatGLM2_Sentiment_Instruction_LoRA_FT · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

oliverwang15/FinGPT_v31_ChatGLM2_Sentiment_Instruction_LoRA_FT · Hugging Face

最後順便附上 v3.2 用 Llama2 為 base model 的：

oliverwang15/FinGPT_v32_Llama2_Sentiment_Instruction_LoRA_FT · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

oliverwang15/FinGPT_v32_Llama2_Sentiment_Instruction_LoRA_FT · Hugging Face

先解釋載入模型的部分，一般可以直接使用 Hugging Face 上面 Model 名稱，例如這樣：

base_model = "THUDM/chatglm2-6b"
peft_model = "oliverwang15/FinGPT_v31_ChatGLM2_Sentiment_Instruction_LoRA_FT"
tokenizer = AutoTokenizer.from_pretrained(
    base_model,
    trust_remote_code=True
)
model = AutoModel.from_pretrained(
    base_model,
    trust_remote_code=True, 
    device_map = "auto"
)
model = PeftModel.from_pretrained(
    model,
    peft_model
)

然後它會自動被下載下來放到 C 槽-使用者 - Documents - .cache - huggingface 類似的地方。但是我個人不喜歡一直放東西到 C 槽，尤其這個說是 cache 但其實我會一直用到。所以這裡其實可以利用 cache_dir 指定要把 cache 放到哪裡：

base_model = "THUDM/chatglm2-6b"
peft_model = "oliverwang15/FinGPT_v31_ChatGLM2_Sentiment_Instruction_LoRA_FT"
tokenizer = AutoTokenizer.from_pretrained(
    base_model,
    trust_remote_code=True,
    cache_dir='你要放 cache 的路徑'
)
model = AutoModel.from_pretrained(
    base_model,
    trust_remote_code=True, 
    device_map = "auto",
    cache_dir='你要放 cache 的路徑e'
)
model = PeftModel.from_pretrained(
    model,
    peft_model,
    cache_dir='你要放 cache 的路徑'
)

往後使用模型，就把 base_model, peft_model 兩個變數改成存下來模型的路徑就可以不用再下載一次。注意：cache 存下來的不能直接當路徑，是要裡面資料夾下有 config.json 檔案那個，才是真的模型資料夾。當然為了方便起見，你也可以手動下載 Hugging Face 上模型下來，直接用路徑讀。

再來就是踩坑的部分。

如果你直接使用 FinGPT 給的範例，可能（看個人情況）跑出以下警告與錯誤：

UserWarning: max_length is ignored when padding=True and there is no truncation strategy. To pad to max length, use padding='max_length'.
UserWarning: You are calling .generate() with the input_ids being on a device type different than your model's device. input_ids is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put input_ids to the correct device by calling for example input_ids = input_ids.to('cuda') before running .generate().
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

第一個警告可以這樣改就不見：

tokenizer(prompt, return_tensors='pt', padding='max_length', max_length=512)

或者是這樣：

tokenizer(prompt, return_tensors='pt', padding=True)

第二個警告跟第三個 Error 一起改：

from torch import cuda

device = 'cuda' if cuda.is_available() else 'cpu'
cuda.empty_cache()
print(f'Devicd: {device}')

# 中間就是載入模型那些

tokens = tokenizer(prompt, return_tensors='pt', padding=True)
tokens.to(device)
model = model.to(device)

這樣就會讓大家在一樣的 device，根據你的個人情況，可能是 CPU，可能是 GPU。

所以最終範例程式碼改成：

from transformers import AutoModel, AutoTokenizer
from peft import PeftModel
from torch import cuda

device = 'cuda' if cuda.is_available() else 'cpu'
cuda.empty_cache()
print(f'Devicd: {device}')

# Load Models
# base_model = "THUDM/chatglm2-6b"
# peft_model = "oliverwang15/FinGPT_v31_ChatGLM2_Sentiment_Instruction_LoRA_FT"
base_model = "你存 THUDM--chatglm2-6b 的路徑"
peft_model = "你存 oliverwang15--FinGPT_v31_ChatGLM2_Sentiment_Instruction_LoRA_FT 的路徑"
tokenizer = AutoTokenizer.from_pretrained(
    base_model,
    trust_remote_code=True,
    cache_dir='你要放 cache 的路徑'
)
model = AutoModel.from_pretrained(
    base_model,
    trust_remote_code=True, 
    device_map = "auto",
    cache_dir='你要放 cache 的路徑'
)
model = PeftModel.from_pretrained(
    model,
    peft_model,
    cache_dir='你要放 cache 的路徑'
)
model = model.eval()

# Make prompts
prompt = [
'''Instruction: What is the sentiment of this news? Please choose an answer from {negative/neutral/positive}
Input: FINANCING OF ASPOCOMP 'S GROWTH Aspocomp is aggressively pursuing its growth strategy by increasingly focusing on technologically more demanding HDI printed circuit boards PCBs .
Answer: ''',
'''Instruction: What is the sentiment of this news? Please choose an answer from {negative/neutral/positive}
Input: According to Gran , the company has no plans to move all production to Russia , although that is where the company is growing .
Answer: ''',
'''Instruction: What is the sentiment of this news? Please choose an answer from {negative/neutral/positive}
Input: A tinyurl link takes users to a scamming site promising that users can earn thousands of dollars by becoming a Google ( NASDAQ : GOOG ) Cash advertiser .
Answer: ''',
]

# Generate results
# tokens = tokenizer(prompt, return_tensors='pt', padding='max_length', max_length=512)
tokens = tokenizer(prompt, return_tensors='pt', padding=True)

tokens.to(device)
model = model.to(device)

res = model.generate(**tokens, max_length=512)
res_sentences = [tokenizer.decode(i) for i in res]
out_text = [o.split("Answer: ")[1] for o in res_sentences]

# show results
for sentiment in out_text:
    print(sentiment)

非官方範例執行時間紀錄

你可以用上面官方範例改掉新聞標題，來辨識 Sentiment，當然如果要多個例子就自己改一下程式碼。

我個人的電腦規格不高，辨識 587,434 條英文新聞標題，花了我將近 23 小時。當然，不能同時一起讓它辨識那麼多，我的 GPU 記憶體會爆掉（只要在 generate 那裡扛不住）。稍微測試了一下，一次 100 條已經很撐了。

不過稍微算一下，是比很多方法輕量、省成本了沒錯，是我的設備不夠好才要花那麼久，但如果是其他方法可能更久。

後記

目前想跑中文新聞標題，但是這個 token 資料量更多，所以我現在精打細算的找雲端 GPU 來用，好想要開多張 GPU 平行算，但是口袋不允許（此處應有 GPU 資源不足梗圖支援）。總之，等我有成功跑完再回來紀錄。

一樣，如果以上內容有錯，歡迎在下方留言反饋給我，也歡迎大家留下自己嘗試後的體驗心得～

或是推薦 Cloud GPU 給我！

❤️贊助支持

◀ 回專欄文章 Back to Posts

◀ 回技術嘗試 Back to Try Technique

【技術嘗試 Try Technique】嘗試一個微調金融領域的 LLM - FinGPT - 來辨識新聞情緒

FinGPT 簡介

Data Sources

Models 與版本

Links

oliverwang15 (Oliver Wang)

FinGPT: Open-Source Financial Large Language Models

本地運行 v3.1 ChatGLM2-6B 版

Step 1: 確保你的 CUDA

CUDA Toolkit 11.8 Downloads

NVIDIA cuDNN Archive

Step 2: 安裝 Pytorch

PyTorch

Step 3: 安裝 Transformers 及相關套件

Installation

Step 4: FinGPT 3.1 範例踩坑＆附註 Hugging Face 模型下載與 cache 路徑自定義說明

github.com

oliverwang15/FinGPT_ChatGLM2_Sentiment_Instruction_LoRA_FT · Hugging Face

oliverwang15/FinGPT_v31_ChatGLM2_Sentiment_Instruction_LoRA_FT · Hugging Face

oliverwang15/FinGPT_v32_Llama2_Sentiment_Instruction_LoRA_FT · Hugging Face

非官方範例執行時間紀錄

後記