https://github.com/facebookresearch/nougat
python环境需要在3.8以上
安装:pip install nougat-ocr
模型默认下载地址:/home/****/.cache/torch/hub/nougat-0.1.0-small
环境安装好之后默认使用cpu
UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11080). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
WARNING:root:No GPU found. Conversion on CPU is very slow.
如果需要使用GPU,则需要重新安装和自己cuda版本对应的torch等,我这边是cuda11.8
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
环境配置好之后即可进行PDF识别
data:image/s3,"s3://crabby-images/7bfc3/7bfc3dde20ebf67b7b8228c277d3e5b30db1a83b" alt=""
在output目录下会生成.mmd格式的文件
vscode中使用如下插件可以查看mmd格式中的内容,文字可直接复制
data:image/s3,"s3://crabby-images/3c2c0/3c2c0d46637e60a74f8ac1525418e49478c3b6a1" alt=""
data:image/s3,"s3://crabby-images/2f917/2f9176b9d07185dfa9ccf7c3be24390c7e0b1081" alt=""
3090GPU上
显存占用17368 / 24576M 显存占用17G,16页的PDF 耗时30秒
自己随便写的文字可能识别不了,图片中的文字无法识别
data:image/s3,"s3://crabby-images/bc34e/bc34e542f3e0360c58eeccbe66a37ddb692bd7ec" alt=""