人工智能基础知识笔记二十五:构建一个优化PDF简历的Agent

本篇文章主要介绍如何构建一个Agent能够解析PDF格式的简历,根据简历的内容,对简历进行优化,以新的格式重新整理内容,并且输出为PDF格式的文件。

1、环境依赖

参考:https://blog.csdn.net/jimmyleeee/article/details/155646865

对于依赖库有所不同,可以参考如下:

python 复制代码
pip install langchain-community langchain-ollama pymupdf reportlab fpdf2 weasyprint

2、构建可以优化简历的Agent

关于保存成PDF格式文件,尝试了三种库:

  • reportlab : 实现函数:save_as_pdf1,有乱码问题,在PDF中显示了黑色的长方形方块
  • fpdf2 :实现函数:save_as_pdf2,有乱码问题,在PDF中显示了问号
  • weasyprint : 实现函数:save_as_pdf, 没有乱码问题,对Unicode的支持相对比较好。
python 复制代码
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_ollama import ChatOllama
from langchain_core.prompts import SystemMessagePromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
import json

# Configuration
base_url = "http://localhost:11434"
model = "qwen2.5"

# Initialize LLM
llm = ChatOllama(model=model, base_url=base_url)

# System prompt for resume polishing
system_prompt = SystemMessagePromptTemplate.from_template(
    """You are an expert HR consultant and resume writer. You specialize in optimizing resumes for Applicant Tracking Systems (ATS) 
    and making them visually appealing while maintaining all relevant information. You always provide well-structured, professional 
    responses in proper English."""
)

# Resume polishing prompt
polish_prompt = HumanMessagePromptTemplate.from_template("""
    You are tasked with taking a parsed resume and creating an improved, professionally formatted version.
    
    Original Resume Content:
    {resume_content}
    
    Please perform the following tasks:
    1. Restructure the information into a clear, logical format with appropriate sections
    2. Improve wording and phrasing to make it more professional and impactful
    3. Ensure consistent formatting throughout
    4. Optimize for readability while preserving all original information
    5. Fill in any missing section headers if needed
    
    Expected Output Structure:
    - Contact Information (Name, Email, Phone, LinkedIn/Website if available)
    - Professional Summary (A brief 2-3 sentence overview)
    - Technical Skills (Organized by category)
    - Work Experience (With company, role, dates, and bullet-point responsibilities/achievements)
    - Education (Degree, institution, graduation date)
    - Projects (If applicable)
    - Certifications (If applicable)
    - Additional Information (Languages, interests, etc.)
    
    Important Guidelines:
    - Maintain all factual information from the original
    - Use strong action verbs
    - Quantify achievements where possible
    - Keep formatting clean and professional
    - Do not add fictional information
    
    Provide only the polished resume content in a structured format, without any markdown or extra explanations.
""")

def load_pdf(file_path):
    """
    Load and parse PDF file
    
    Args:
        file_path (str): Path to the PDF file
        
    Returns:
        str: Parsed content from the PDF
    """
    loader = PyMuPDFLoader(file_path)
    docs = loader.load()
    return "\n".join([doc.page_content for doc in docs])

def polish_resume(resume_content):
    """
    Polish resume content using LLM
    
    Args:
        resume_content (str): Raw parsed resume content
        
    Returns:
        str: Polished resume content
    """
    messages = [system_prompt, polish_prompt]
    template = ChatPromptTemplate.from_messages(messages)
    
    polish_chain = template | llm | StrOutputParser()
    return polish_chain.invoke({"resume_content": resume_content})

def save_as_txt(content, output_path):
    """
    Save polished content as text file
    
    Args:
        content (str): Polished resume content
        output_path (str): Output file path
    """
    with open(output_path, 'w', encoding='utf-8') as f:
        f.write(content)
# implemented by weasyprint
def save_as_pdf(content, output_path):
    """
    Save polished content as PDF file using WeasyPrint (best Unicode support)
    
    Args:
        content (str): Polished resume content
        output_path (str): Output PDF file path
    """
    try:
        from weasyprint import HTML, CSS
        import tempfile
        import os
        
        # Create HTML content with proper encoding
        html_content = f"""
        <!DOCTYPE html>
        <html>
        <head>
            <meta charset="UTF-8">
            <style>
                body {{
                    font-family: "Microsoft YaHei", "SimSun", "PingFang SC", sans-serif;
                    font-size: 12pt;
                    line-height: 1.5;
                    margin: 1in;
                }}
                pre {{
                    white-space: pre-wrap;
                    word-wrap: break-word;
                }}
            </style>
        </head>
        <body>
            <pre>{content}</pre>
        </body>
        </html>
        """
        
        # Generate PDF
        html = HTML(string=html_content)
        html.write_pdf(output_path)
        print(f"PDF saved to {output_path}")
        
    except ImportError:
        print("WeasyPrint not installed. Install with: pip install weasyprint")
        txt_path = output_path.replace('.pdf', '.txt')
        save_as_txt(content, txt_path)
        print(f"TXT saved to {txt_path}")
    except Exception as e:
        print(f"Error creating PDF with WeasyPrint: {e}")
        # Fallback to text file
        txt_path = output_path.replace('.pdf', '.txt')
        save_as_txt(content, txt_path)
        print(f"TXT saved to {txt_path} as fallback")

# implemented by fpdf2
def save_as_pdf2(content, output_path):
    """
    Save polished content as PDF file using fpdf2 with better Unicode support
    
    Args:
        content (str): Polished resume content
        output_path (str): Output PDF file path
    """
    try:
        from fpdf import FPDF
        
        class PDF(FPDF):
            def header(self):
                pass
                
            def footer(self):
                pass
                
        pdf = PDF()
        pdf.add_page()
        pdf.set_auto_page_break(auto=True, margin=15)
        
        # Set font (you may need to specify a path to a Unicode font)
        try:
            # Try to use a Unicode font
            pdf.add_font('DejaVu', '', 'DejaVuSans.ttf', uni=True)
            pdf.set_font('DejaVu', '', 12)
        except:
            # Fallback to built-in font
            pdf.set_font('Arial', '', 12)
            
        # Add content
        lines = content.split('\n')
        for line in lines:
            pdf.cell(0, 10, line.encode('latin-1', 'replace').decode('latin-1'), ln=True)
            
        pdf.output(output_path)
        print(f"PDF saved to {output_path}")
        
    except ImportError:
        print("fpdf2 not installed. Saving as TXT instead.")
        txt_path = output_path.replace('.pdf', '.txt')
        save_as_txt(content, txt_path)
        print(f"TXT saved to {txt_path}")
    except Exception as e:
        print(f"Error creating PDF: {e}")
        # Fallback to text file
        txt_path = output_path.replace('.pdf', '.txt')
        save_as_txt(content, txt_path)
        print(f"TXT saved to {txt_path} as fallback")

# implemented by reportlab
def save_as_pdf1(content, output_path):
    """
    Save polished content as PDF file (requires additional libraries)
    
    Args:
        content (str): Polished resume content
        output_path (str): Output PDF file path
    """
    try:
        from reportlab.lib.pagesizes import letter
        from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
        from reportlab.lib.styles import getSampleStyleSheet
        
        doc = SimpleDocTemplate(output_path, pagesize=letter)
        styles = getSampleStyleSheet()
        story = []
        
        # Split content into lines and create paragraphs
        lines = content.split('\n')
        for line in lines:
            if line.strip():
                story.append(Paragraph(line, styles['Normal']))
                story.append(Spacer(1, 12))
        
        doc.build(story)
        print(f"PDF saved to {output_path}")
    except ImportError:
        print("reportlab not installed. Saving as TXT instead.")
        txt_path = output_path.replace('.pdf', '.txt')
        save_as_txt(content, txt_path)
        print(f"TXT saved to {txt_path}")

def process_resume(input_pdf_path, output_path):
    """
    Main function to process resume from PDF to polished format
    
    Args:
        input_pdf_path (str): Input PDF file path
        output_path (str): Output file path (.txt or .pdf)
    """
    # Load PDF
    print("Loading PDF...")
    raw_content = load_pdf(input_pdf_path)
    
    # Polish content
    print("Polishing resume with LLM...")
    polished_content = polish_resume(raw_content)
    
    # Save result
    if output_path.endswith('.pdf'):
        save_as_pdf(polished_content, output_path)
    else:
        save_as_txt(polished_content, output_path)
    
    print(f"Polished resume saved to {output_path}")
    return polished_content


if __name__ == "__main__":
    input_path = "resume/test_resume.pdf"
    output_path = "polished_resumes/test_polished.pdf"  # or .pdf
    polished = process_resume(input_path, output_path)
    print(polished)

3、运行Agent进行测试

在weasyprint 实现的版本中,运行中由于遇到问题:

"Error creating PDF with WeasyPrint: cannot load library 'libgobject-2.0-0': error 0x7e. Additionally, ctypes.util.find_library() did not manage to locate a library called 'libgobject-2.0-0'

TXT saved to polished_resumes/test_polished.txt as fallback",【经过查询需要安装GTK,下载路径 https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer,并且需要将GTK的可执行文件添加到PATH环境变量里才可以解决此问题。】导致了不能生成PDF,而输出了txt文件,在TXT的文件里,也没有了乱码的问题。 不过,在输出的结果中,公司名称和大学名称还不能翻译成英文,而是直接显示成了中文。 整体上,优化过的简历可以作为参考,通过Word或者WPS打开,简单改改还是可以使用的。 有兴趣进一步优化的,可以选择进一步尝试如何一步到位,解决公司和学校名称不能翻译的问题。

相关推荐
AI机器学习算法1 小时前
深度学习模型演进:6个里程碑式CNN架构
人工智能·深度学习·cnn·大模型·ai学习路线
Ztopcloud极拓云视角2 小时前
从 OpenRouter 数据看中美 AI 调用量反转:统计口径、模型路由与多云应对方案
人工智能·阿里云·大模型·token·中美ai
AI医影跨模态组学2 小时前
如何将深度学习MTSR与膀胱癌ITGB8/TGF-β/WNT机制建立关联,并进一步解释其与患者预后及肿瘤侵袭、免疫抑制的生物学联系
人工智能·深度学习·论文·医学影像
搬砖的前端2 小时前
AI编辑器开源主模型搭配本地模型辅助对标GPT5.2/GPT5.4/Claude4.6(前端开发专属)
人工智能·开源·claude·mcp·trae·qwen3.6·ops4.6
handler012 小时前
从源码到二进制:深度拆解 Linux 下 C 程序的编译与链接全流程
linux·c语言·开发语言·c++·笔记·学习
Python私教3 小时前
Hermes Agent 安全加固与生态扩展:2026-04-23 更新解析
人工智能
饼干哥哥3 小时前
Kimi K2.6 干成了Claude Design国产版,一句话生成电影级的动态品牌网站
人工智能
肖有米XTKF86463 小时前
带货者精品优选模式系统的平台解析
人工智能·信息可视化·团队开发·csdn开发云
天天进步20153 小时前
打破沙盒限制:OpenWork 如何通过权限模型实现安全的系统级调用?
人工智能·安全
xcbrand3 小时前
政府事业机构品牌策划公司找哪家
大数据·人工智能·python