人工智能基础知识笔记二十五:构建一个优化PDF简历的Agent

本篇文章主要介绍如何构建一个Agent能够解析PDF格式的简历,根据简历的内容,对简历进行优化,以新的格式重新整理内容,并且输出为PDF格式的文件。

1、环境依赖

参考:https://blog.csdn.net/jimmyleeee/article/details/155646865

对于依赖库有所不同,可以参考如下:

python 复制代码
pip install langchain-community langchain-ollama pymupdf reportlab fpdf2 weasyprint

2、构建可以优化简历的Agent

关于保存成PDF格式文件,尝试了三种库:

  • reportlab : 实现函数:save_as_pdf1,有乱码问题,在PDF中显示了黑色的长方形方块
  • fpdf2 :实现函数:save_as_pdf2,有乱码问题,在PDF中显示了问号
  • weasyprint : 实现函数:save_as_pdf, 没有乱码问题,对Unicode的支持相对比较好。
python 复制代码
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_ollama import ChatOllama
from langchain_core.prompts import SystemMessagePromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
import json

# Configuration
base_url = "http://localhost:11434"
model = "qwen2.5"

# Initialize LLM
llm = ChatOllama(model=model, base_url=base_url)

# System prompt for resume polishing
system_prompt = SystemMessagePromptTemplate.from_template(
    """You are an expert HR consultant and resume writer. You specialize in optimizing resumes for Applicant Tracking Systems (ATS) 
    and making them visually appealing while maintaining all relevant information. You always provide well-structured, professional 
    responses in proper English."""
)

# Resume polishing prompt
polish_prompt = HumanMessagePromptTemplate.from_template("""
    You are tasked with taking a parsed resume and creating an improved, professionally formatted version.
    
    Original Resume Content:
    {resume_content}
    
    Please perform the following tasks:
    1. Restructure the information into a clear, logical format with appropriate sections
    2. Improve wording and phrasing to make it more professional and impactful
    3. Ensure consistent formatting throughout
    4. Optimize for readability while preserving all original information
    5. Fill in any missing section headers if needed
    
    Expected Output Structure:
    - Contact Information (Name, Email, Phone, LinkedIn/Website if available)
    - Professional Summary (A brief 2-3 sentence overview)
    - Technical Skills (Organized by category)
    - Work Experience (With company, role, dates, and bullet-point responsibilities/achievements)
    - Education (Degree, institution, graduation date)
    - Projects (If applicable)
    - Certifications (If applicable)
    - Additional Information (Languages, interests, etc.)
    
    Important Guidelines:
    - Maintain all factual information from the original
    - Use strong action verbs
    - Quantify achievements where possible
    - Keep formatting clean and professional
    - Do not add fictional information
    
    Provide only the polished resume content in a structured format, without any markdown or extra explanations.
""")

def load_pdf(file_path):
    """
    Load and parse PDF file
    
    Args:
        file_path (str): Path to the PDF file
        
    Returns:
        str: Parsed content from the PDF
    """
    loader = PyMuPDFLoader(file_path)
    docs = loader.load()
    return "\n".join([doc.page_content for doc in docs])

def polish_resume(resume_content):
    """
    Polish resume content using LLM
    
    Args:
        resume_content (str): Raw parsed resume content
        
    Returns:
        str: Polished resume content
    """
    messages = [system_prompt, polish_prompt]
    template = ChatPromptTemplate.from_messages(messages)
    
    polish_chain = template | llm | StrOutputParser()
    return polish_chain.invoke({"resume_content": resume_content})

def save_as_txt(content, output_path):
    """
    Save polished content as text file
    
    Args:
        content (str): Polished resume content
        output_path (str): Output file path
    """
    with open(output_path, 'w', encoding='utf-8') as f:
        f.write(content)
# implemented by weasyprint
def save_as_pdf(content, output_path):
    """
    Save polished content as PDF file using WeasyPrint (best Unicode support)
    
    Args:
        content (str): Polished resume content
        output_path (str): Output PDF file path
    """
    try:
        from weasyprint import HTML, CSS
        import tempfile
        import os
        
        # Create HTML content with proper encoding
        html_content = f"""
        <!DOCTYPE html>
        <html>
        <head>
            <meta charset="UTF-8">
            <style>
                body {{
                    font-family: "Microsoft YaHei", "SimSun", "PingFang SC", sans-serif;
                    font-size: 12pt;
                    line-height: 1.5;
                    margin: 1in;
                }}
                pre {{
                    white-space: pre-wrap;
                    word-wrap: break-word;
                }}
            </style>
        </head>
        <body>
            <pre>{content}</pre>
        </body>
        </html>
        """
        
        # Generate PDF
        html = HTML(string=html_content)
        html.write_pdf(output_path)
        print(f"PDF saved to {output_path}")
        
    except ImportError:
        print("WeasyPrint not installed. Install with: pip install weasyprint")
        txt_path = output_path.replace('.pdf', '.txt')
        save_as_txt(content, txt_path)
        print(f"TXT saved to {txt_path}")
    except Exception as e:
        print(f"Error creating PDF with WeasyPrint: {e}")
        # Fallback to text file
        txt_path = output_path.replace('.pdf', '.txt')
        save_as_txt(content, txt_path)
        print(f"TXT saved to {txt_path} as fallback")

# implemented by fpdf2
def save_as_pdf2(content, output_path):
    """
    Save polished content as PDF file using fpdf2 with better Unicode support
    
    Args:
        content (str): Polished resume content
        output_path (str): Output PDF file path
    """
    try:
        from fpdf import FPDF
        
        class PDF(FPDF):
            def header(self):
                pass
                
            def footer(self):
                pass
                
        pdf = PDF()
        pdf.add_page()
        pdf.set_auto_page_break(auto=True, margin=15)
        
        # Set font (you may need to specify a path to a Unicode font)
        try:
            # Try to use a Unicode font
            pdf.add_font('DejaVu', '', 'DejaVuSans.ttf', uni=True)
            pdf.set_font('DejaVu', '', 12)
        except:
            # Fallback to built-in font
            pdf.set_font('Arial', '', 12)
            
        # Add content
        lines = content.split('\n')
        for line in lines:
            pdf.cell(0, 10, line.encode('latin-1', 'replace').decode('latin-1'), ln=True)
            
        pdf.output(output_path)
        print(f"PDF saved to {output_path}")
        
    except ImportError:
        print("fpdf2 not installed. Saving as TXT instead.")
        txt_path = output_path.replace('.pdf', '.txt')
        save_as_txt(content, txt_path)
        print(f"TXT saved to {txt_path}")
    except Exception as e:
        print(f"Error creating PDF: {e}")
        # Fallback to text file
        txt_path = output_path.replace('.pdf', '.txt')
        save_as_txt(content, txt_path)
        print(f"TXT saved to {txt_path} as fallback")

# implemented by reportlab
def save_as_pdf1(content, output_path):
    """
    Save polished content as PDF file (requires additional libraries)
    
    Args:
        content (str): Polished resume content
        output_path (str): Output PDF file path
    """
    try:
        from reportlab.lib.pagesizes import letter
        from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
        from reportlab.lib.styles import getSampleStyleSheet
        
        doc = SimpleDocTemplate(output_path, pagesize=letter)
        styles = getSampleStyleSheet()
        story = []
        
        # Split content into lines and create paragraphs
        lines = content.split('\n')
        for line in lines:
            if line.strip():
                story.append(Paragraph(line, styles['Normal']))
                story.append(Spacer(1, 12))
        
        doc.build(story)
        print(f"PDF saved to {output_path}")
    except ImportError:
        print("reportlab not installed. Saving as TXT instead.")
        txt_path = output_path.replace('.pdf', '.txt')
        save_as_txt(content, txt_path)
        print(f"TXT saved to {txt_path}")

def process_resume(input_pdf_path, output_path):
    """
    Main function to process resume from PDF to polished format
    
    Args:
        input_pdf_path (str): Input PDF file path
        output_path (str): Output file path (.txt or .pdf)
    """
    # Load PDF
    print("Loading PDF...")
    raw_content = load_pdf(input_pdf_path)
    
    # Polish content
    print("Polishing resume with LLM...")
    polished_content = polish_resume(raw_content)
    
    # Save result
    if output_path.endswith('.pdf'):
        save_as_pdf(polished_content, output_path)
    else:
        save_as_txt(polished_content, output_path)
    
    print(f"Polished resume saved to {output_path}")
    return polished_content


if __name__ == "__main__":
    input_path = "resume/test_resume.pdf"
    output_path = "polished_resumes/test_polished.pdf"  # or .pdf
    polished = process_resume(input_path, output_path)
    print(polished)

3、运行Agent进行测试

在weasyprint 实现的版本中,运行中由于遇到问题:

"Error creating PDF with WeasyPrint: cannot load library 'libgobject-2.0-0': error 0x7e. Additionally, ctypes.util.find_library() did not manage to locate a library called 'libgobject-2.0-0'

TXT saved to polished_resumes/test_polished.txt as fallback",【经过查询需要安装GTK,下载路径 https://github.com/tschoonj/GTK-for-Windows-Runtime-Environment-Installer,并且需要将GTK的可执行文件添加到PATH环境变量里才可以解决此问题。】导致了不能生成PDF,而输出了txt文件,在TXT的文件里,也没有了乱码的问题。 不过,在输出的结果中,公司名称和大学名称还不能翻译成英文,而是直接显示成了中文。 整体上,优化过的简历可以作为参考,通过Word或者WPS打开,简单改改还是可以使用的。 有兴趣进一步优化的,可以选择进一步尝试如何一步到位,解决公司和学校名称不能翻译的问题。

相关推荐
阿正的梦工坊1 小时前
τ-bench:重塑Agent评估的工具-代理-用户交互基准
人工智能·机器学习·大模型·llm
地中海~1 小时前
LARGE LANGUAGE MODELS ARE NOT ROBUST ICLR2024
人工智能·笔记·nlp
2的n次方_1 小时前
openEuler操作系统环境:目标检测软件开发实操与性能评估
人工智能·目标检测·计算机视觉
Mintopia1 小时前
🎓 高校与企业合作:WebAIGC前沿技术的产学研转化路径
人工智能·aigc·编程语言
tangdou3690986551 小时前
AI真好玩系列-WebGL爱心粒子手势互动教程 | Interactive Heart Particles with Hand Gestures
前端·人工智能·webgl
im_AMBER1 小时前
Leetcode 70 好数对的数目 | 与对应负数同时存在的最大正整数
数据结构·笔记·学习·算法·leetcode
hd51cc4 小时前
MFC消息 学习笔记
笔记·学习·mfc
谷粒.4 小时前
Cypress vs Playwright vs Selenium:现代Web自动化测试框架深度评测
java·前端·网络·人工智能·python·selenium·测试工具
CareyWYR8 小时前
每周AI论文速递(251201-251205)
人工智能