创新型钢炮模型 DeepSeek-OCR

官网

https://github.com/deepseek-ai/DeepSeek-OCR

安装软件

由于安装相对麻烦，故制作了一键整合包，关注本公众号，回复 deepseekocr 获取。

shell

conda create -n deepseek-ocr python=3.12.9 -y
conda activate deepseek-ocr
git clone https://github.com/deepseek-ai/DeepSeek-OCR.git
cd DeepSeek-OCR
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
# 从此处下载轮子 https://github.com/kingbri1/flash-attention/releases
pip install flash_attn-2.8.3+cu128torch2.8.0cxx11abiFALSE-cp312-cp312-win_amd64.whl

使用方式

目前没有 UI 界面，以下以官方的命令行方式来运行。windows 环境下需要以 hf 模式运行。

shell

cd DeepSeek-OCR-master/DeepSeek-OCR-hf

之后修改 run_dpsk_ocr.py 文件，修改以下三行，第一行是提示语；第二行是输入文件；第三行是输出文件夹。

shell

prompt = "<image>\nDescribe this image in detail."
image_file = 'input/ComfyUI_00258_.png'
output_path = 'output/'

之后运行如下命令进行推理（首次会自动从 huggingFace 上下载模型）

shell

python run_dpsk_ocr.py

测试下来推理速度很快，性能强悍，显存占用在 15.2G 左右。

官方给出的一些 Prompt 示例：（可以使用 DeepSeek 模型去解释这些 Prompt 的使用方式）

text

# document: <image>\n<|grounding|>Convert the document to markdown.
# other image: <image>\n<|grounding|>OCR this image.
# without layouts: <image>\nFree OCR.
# figures in document: <image>\nParse the figure.
# general: <image>\nDescribe this image in detail.
# rec: <image>\nLocate <|ref|>xxxx<|/ref|> in the image.
# '先天下之忧而忧'

文章的最后，如果您觉得本文对您有用，请打赏一杯咖啡！感谢！

创新型钢炮模型 DeepSeek-OCR ​

安装软件 ​

使用方式 ​

创新型钢炮模型 DeepSeek-OCR

安装软件

使用方式