Python docx：在Python中创建和操作Word文档

使用docx库，可以执行各种任务

创建新文档：可以使用库从头开始或基于模板生成新的Word文档。这对于自动生成报告、信函和其他类型的文档非常有用。
修改现有文档：可以打开现有的Word文档，并使用库修改其内容、格式、样式等。这对于自动更新遵循特定结构的文档特别方便。
添加内容：可以使用库向文档添加段落、标题、表格、图像和其他元素。这有助于用数据动态填充文档。
格式化：该库允许将各种格式化选项应用于文档中的文本和元素，例如更改字体、颜色、对齐方式等。
提取信息：还可以从现有Word文档中提取文本、图像、表格和其他内容，以便进一步分析

Docx functions

1. 文档创建和保存

Document(): 创建一个新的word文档
Document.save('filename.docx')：保存一个document 称为文件（*.docx）

2. Paragraphs and Text (段落和文本)

add_paragraph('text')：添加具有指定文本（text）的新段落（Paragraphs）。
paragraph.text：获取或设置段落的文本内容。

3. Headings （标题，可以设置几级标题）

add_heading('text', level=n): 添加具有指定文本和级别的标题 (1 to 9).

4. Styles and Formatting （样式与格式）

paragraph.style = 'StyleName': 应用特定的段落样式
run = paragraph.add_run('text'): 添加一段具有特定格式的文本
run.bold, run.italic, etc.: 对管路（run）应用格式设置

5. Tables (表格操作)

add_table(rows, cols): 添加具有指定行数和列数的表
table.cell(row, col): 获取表中的特定单元格（cell）
cell.text:获取或设置单元格的文本内容
table.rows, table.columns:访问表的行和列

6. Images(图片操作)

document.add_picture('image_path'): 向文档中添加图像
run.add_picture('image_path'): 将图像添加到特定管道（run）中, 比如简历照片位置固定的

7. Document Properties （文档属性）

document.core_properties.title: 设置文档的标题
document.core_properties.author: 设置文档的作者
document.core_properties.keywords: 设置文档的关键词

8. Sections and Page Setup （分区和页面设置）

section = document.sections[0]: 获取文档的第一部分（ Get the first section of the document）
section.page_width, section.page_height: 设置页面尺寸（Set page dimensions）

9. Lists （列表）

就是markdown中的list,比如下面的这两个就是无序的，大标题1，2，3...就是有序的

add_paragraph('text', style='ListBullet'):创建无序列表（ Create a bulleted list）
add_paragraph('text', style='ListNumber'): 创建有序列表（Create a numbered list.）

10. Hyperlinks (超链接)

run.add_hyperlink('url', 'text'): 给当前管道（run）内的特定文本（text）添加超链接(Add a hyperlink to a run)

11. Document Modification （文件修改）

document.paragraphs: 访问文档中的所有段落（Access all paragraphs in the document）
document.tables: 访问文档中的所有表格（Access all tables in the document）
document.styles: 访问和操作文档样式（Access and manipulate document styles）

12. Document Reading（文档读取）

Document('filename.docx'): 读取一个存在的word文件
document.paragraphs[0].text: 访问第一段（paragraphs）的文本（text）

小例子

1. Installation (安装)

python 复制代码

pip install python-docx

2. 创建一个新的word文档

创建一个包含文本、标题、表格、图像和格式的文档

Create a new document.（创建一个新的document 对象）
Add a title with centered alignment.（添加一个标题（title）并居中对齐）
Add a paragraph with bold and italic text.（添加带有粗体和斜体文本的段落）
Add a heading and a bulleted list.（添加标题(heading)和项目符号列表）
Add a table with custom column widths.（添加table,并自定义列宽）
Add an image to the document.（添加图片）
Save the document with the name 'example_document.docx'.（保存文件，文件名为 example_document.docx）

python 复制代码

from docx import Document
from docx.shared import Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH

# Create a new document
doc = Document()

# Add a title
title = doc.add_heading('Document Creation Example', level=1)
title.alignment = WD_ALIGN_PARAGRAPH.CENTER

# Add a paragraph with bold and italic text
paragraph = doc.add_paragraph('This is a sample document created using the python-docx library.')
run = paragraph.runs[0]
run.bold = True
run.italic = True

# Add a heading
doc.add_heading('Section 1: Introduction', level=2)

# Add a bulleted list
list_paragraph = doc.add_paragraph()
list_paragraph.add_run('Bullet 1').bold = True
list_paragraph.add_run(' - This is the first bullet point.')
list_paragraph.add_run('\n')
list_paragraph.add_run('Bullet 2').bold = True
list_paragraph.add_run(' - This is the second bullet point.')

# Add a table
doc.add_heading('Section 2: Data', level=2)
table_1 = doc.add_table(rows=1, cols=2)
table_1.style = 'Table Grid'
table_1.autofit = False
table_1.allow_autofit = False
for row in table_1.rows:
    for cell in row.cells:
        cell.width = Pt(150)
table_1.cell(0, 0).text = 'cat'
table_1.cell(0, 1).text = 'dog'

table_2 = doc.add_table(rows=3, cols=3)
table_2.style = 'Table Grid'
table_2.autofit = False
table_2.allow_autofit = False
for row in table_2.rows:
    for cell in row.cells:
        cell.width = Pt(100)
table_2.cell(0, 0).text = 'Name'
table_2.cell(0, 1).text = 'Age'
table_2.cell(0, 2).text = 'City'
for i, data in enumerate([('Alice', '25', 'New York'), ('Bob', '30', 'San Francisco'), ('Charlie', '22', 'Los Angeles')], start=0):
    print(i, data)
    table_2.cell(i, 0).text = data[0]
    table_2.cell(i, 1).text = data[1]
    table_2.cell(i, 2).text = data[2]

# Add an image
doc.add_heading('Section 3: Image', level=2)
doc.add_paragraph('Here is an image of cat:')
doc.add_picture('../imgs/cat.jpg', width=Pt(300))

# Save the document
doc.save('../word_files/example_new_document.docx')

结果（哈哈，样式有点丑，暂时忽略...）：

3. 修改现有的word文档

open an existing Word document ('existing_document.docx').( 读取一个存在的word文档)
Modify the text, formatting, and alignment of the first paragraph.（修改第一段的文本、格式和对齐方式）
Add a new heading.（添加一个新的标题）
Add a new paragraph with a hyperlink.（添加带有超链接的新段落）
Add a new table with custom column widths and data.（添加一个具有自定义列宽和数据的新表）
Save the modified document as 'modified_document.docx'.（将修改后的文档另存为"modified_document.docx"）

python 复制代码

import docx
from docx import Document
from docx.shared import Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH

def add_hyperlink(paragraph, url, text, color, underline):
    """
    A function that places a hyperlink within a paragraph object.

    :param paragraph: The paragraph we are adding the hyperlink to.
    :param url: A string containing the required url
    :param text: The text displayed for the url
    :return: The hyperlink object
    """

    # This gets access to the document.xml.rels file and gets a new relation id value
    part = paragraph.part
    r_id = part.relate_to(url, docx.opc.constants.RELATIONSHIP_TYPE.HYPERLINK, is_external=True)

    # Create the w:hyperlink tag and add needed values
    hyperlink = docx.oxml.shared.OxmlElement('w:hyperlink')
    hyperlink.set(docx.oxml.shared.qn('r:id'), r_id, )

    # Create a w:r element
    new_run = docx.oxml.shared.OxmlElement('w:r')

    # Create a new w:rPr element
    rPr = docx.oxml.shared.OxmlElement('w:rPr')

    # Add color if it is given
    if not color is None:
      c = docx.oxml.shared.OxmlElement('w:color')
      c.set(docx.oxml.shared.qn('w:val'), color)
      rPr.append(c)

    # Remove underlining if it is requested
    if not underline:
      u = docx.oxml.shared.OxmlElement('w:u')
      u.set(docx.oxml.shared.qn('w:val'), 'none')
      rPr.append(u)

    # Join all the xml elements together add add the required text to the w:r element
    new_run.append(rPr)
    new_run.text = text
    hyperlink.append(new_run)

    paragraph._p.append(hyperlink)

    return hyperlink
# Open an existing document

doc = Document('../word_files/example_new_document.docx')

# Access the first paragraph and modify its text and formatting
first_paragraph = doc.paragraphs[0]
first_paragraph.text = 'Updated Text: 宫廷玉液酒，一百八一杯。'
run = first_paragraph.runs[0]
run.bold = True #加粗
run.italic = True #斜体
run.font.size = Pt(20) #字号
first_paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER #居中对齐

# Add a new heading
doc.add_heading('New Section', level=1)

# Add a new paragraph with a hyperlink
new_paragraph = doc.add_paragraph('Visit my bolg website: ')
hyperlink = add_hyperlink(new_paragraph,
              'https://blog.csdn.net/weixin_40959890/article/details/137598605?spm=1001.2014.3001.5501',
              'Python docx：在Python中创建和操作Word文档',
              'FF8822', True)
# run = new_paragraph.add_run('Python docx：在Python中创建和操作Word文档')
# run.hyperlink.address = 'https://blog.csdn.net/weixin_40959890/article/details/137598605?spm=1001.2014.3001.5501'

# Add a new table
doc.add_heading('Table Section', level=2)
table = doc.add_table(rows=4, cols=4)
table.style = 'Table Grid'
table.autofit = False
table.allow_autofit = False
for row in table.rows:
    for cell in row.cells:
        cell.width = Pt(100)
table.cell(0, 0).text = 'Name'
table.cell(0, 1).text = 'Age'
table.cell(0, 2).text = 'City'
for i, data in enumerate([('David', '128', 'London'), ('Emma', '135', 'New York'), ('John', '122', 'Los Angeles')], start=1):
    table.cell(i, 0).text = data[0]
    table.cell(i, 1).text = data[1]
    table.cell(i, 2).text = data[2]

# Save the modified document
doc.save('../word_files/example_modified_document.docx')

结果看一下（依旧很丑，哈哈，但是修改成功了）：

参考

word插入超链接
 examples
python-docx文档
 pypi python-docx