python积累--多线程的使用实例

Python积累------多线程的使用实例

多线程编程是Python进阶开发中的核心技能之一。它允许程序同时执行多个任务，显著提升I/O密集型应用的效率。本文将基于实际代码示例，从基础到进阶，系统讲解Python多线程的用法、注意事项及最佳实践。

一、多线程的核心概念与优势

什么是多线程？

多线程类似于同时执行多个不同程序，每个线程共享进程的资源（如内存、文件句柄），但拥有独立的CPU寄存器上下文（包括指令指针和堆栈指针）。

多线程的五大优势：

后台处理：将耗时任务（如大文件处理）放到后台执行，不阻塞主流程。
提升用户体验：在GUI程序中，点击按钮触发任务时可显示进度条，界面保持响应。
加速程序运行：在多核CPU上，计算密集型任务可并行加速（需注意GIL限制）。
高效等待：在用户输入、文件读写、网络收发等场景下，线程可主动让出资源。
轻量级：线程比进程更轻量，创建和切换开销更小。

二、Python多线程模块演进

版本	模块	状态
Python2	`thread`	已废弃
Python3	`_thread`	底层兼容模块
Python3+	`threading`	推荐使用

注意：thread模块在Python3中被重命名为_thread，仅用于向后兼容。生产环境应优先使用threading模块。

三、基础实例：Python2与Python3的对比

示例1：Python2的`thread`模块

python 复制代码

#!/usr/bin/python
# -*- coding: UTF-8 -*-

import thread
import time

def print_time(threadName, delay):
    count = 0
    while count < 5:
        time.sleep(delay)
        count += 1
        print "%s: %s" % (threadName, time.ctime(time.time()))

try:
    thread.start_new_thread(print_time, ("Thread-1", 2,))
    thread.start_new_thread(print_time, ("Thread-2", 4,))
except:
    print "Error: unable to start thread"

while 1:
    pass  # 保持主线程存活

示例2：Python3的`_thread`模块（兼容写法）

python 复制代码

#!/usr/bin/python3

import _thread
import time

def print_time(threadName, delay):
    count = 0
    while count < 5:
        time.sleep(delay)
        count += 1
        print ("%s: %s" % (threadName, time.ctime(time.time())))

try:
    _thread.start_new_thread(print_time, ("Thread-1", 2,))
    _thread.start_new_thread(print_time, ("Thread-2", 4,))
except:
    print ("Error: 无法启动线程")

while 1:
    pass

关键点：

使用start_new_thread()启动线程，参数为函数名和参数元组。
主线程必须保持存活（通过while 1或time.sleep()），否则子线程会被强制终止。

四、推荐用法：`threading`模块

示例3：继承`threading.Thread`类

python 复制代码

#!/usr/bin/python3

import threading
import time

exitFlag = 0

class myThread(threading.Thread):
    def __init__(self, threadID, name, counter):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.name = name
        self.counter = counter

    def run(self):  # 重写run方法
        print ("开始线程：" + self.name)
        print_time(self.name, self.counter, 5)
        print ("退出线程：" + self.name)

def print_time(threadName, delay, counter):
    while counter:
        if exitFlag:
            threadName.exit()
        time.sleep(delay)
        print ("%s: %s" % (threadName, time.ctime(time.time())))
        counter -= 1

thread1 = myThread(1, "Thread-1", 1)
thread2 = myThread(2, "Thread-2", 2)

thread1.start()
thread2.start()
thread1.join()  # 等待线程结束
thread2.join()
print ("退出主线程")

核心方法：

start()：启动线程，自动调用run()。
join()：阻塞主线程，直到子线程执行完毕。

五、实战案例：爬虫多线程批量处理

示例4：使用`_thread`实现多线程数据解析

python 复制代码

from spider.dao.itemLinkDao import *
from spider.dao.categoryPageLinkDao import *
import json
from bs4 import BeautifulSoup
import time
import re
import _thread

def parserawauto(begin, size):
    linkhead = "http://www.525.life/"
    linkend = "/mode_show?token=&user_key=&app_version=2.6.2.1"
    
    while 1:
        try:
            count = countNoDealedPageRaw()
            if count == 0:
                break
            raws = findNoDealedRawLimit(begin, size)
            for raw in raws:
                if raw['source'] == '食物库app':
                    contentjson = json.loads(raw['content'])
                    for food in contentjson['foods']:
                        link = linkhead + food['code'] + linkend
                        insertItemLink(food['code'], food['name'], raw['link'], link, raw['type'], raw['source'])
                        dealCategoryPageRaw(raw['link'])
                else:
                    soup = BeautifulSoup(raw['content'])
                    div = soup.find("div", class_="widget-food-list")
                    ul = div.find("ul", class_="food-list")
                    for box in ul.find_all("div", class_="text-box"):
                        node = box.find('a', href=re.compile(r'/shiwu/\w+'))
                        code = node['href'].replace("/shiwu/", "")
                        name = node['title']
                        link = linkhead + code + linkend
                        insertItemLink(code, name, raw['link'], link, raw['type'], raw['source'])
                        dealCategoryPageRaw(raw['link'])
                print("dealed %s %s %s" % (raw['source'], raw['type'], raw['link']))
        except Exception as e:
            print(e)
    return "begin " + str(begin) + " finish" + datetime.now()

def run():
    # 启动20个线程，每个处理不同的数据偏移
    for i in range(0, 2000, 100):
        try:
            _thread.start_new_thread(parserawauto, (i, 100))
        except Exception as e:
            print(e)
            print("Error: unable to start thread")

run()

while 1:  # 主线程保持运行
    pass

设计亮点：

每个线程负责处理不同偏移量（begin）的数据，实现并行抓取。
内部while 1循环持续处理新数据，直到队列为空。
try-except捕获异常，避免单个线程崩溃影响整体。

六、线程同步：锁机制

示例5：使用`threading.Lock`实现互斥

python 复制代码

#!/usr/bin/python3

import threading
import time

class myThread(threading.Thread):
    def __init__(self, threadID, name, counter):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.name = name
        self.counter = counter

    def run(self):
        print("开启线程：" + self.name)
        threadLock.acquire()  # 获取锁
        print_time(self.name, self.counter, 3)
        threadLock.release()  # 释放锁

def print_time(threadName, delay, counter):
    while counter:
        time.sleep(delay)
        print("%s: %s" % (threadName, time.ctime(time.time())))
        counter -= 1

threadLock = threading.Lock()
threads = []

thread1 = myThread(1, "Thread-1", 1)
thread2 = myThread(2, "Thread-2", 2)

thread1.start()
thread2.start()
threads.append(thread1)
threads.append(thread2)

for t in threads:
    t.join()
print("退出主线程")

输出效果 ：

Thread-1执行完毕后，Thread-2才开始执行（锁保证了顺序）。

七、线程优先级队列

示例6：使用`queue.Queue`管理任务

python 复制代码

#!/usr/bin/python3

import queue
import threading
import time

exitFlag = 0

class myThread(threading.Thread):
    def __init__(self, threadID, name, q):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.name = name
        self.q = q

    def run(self):
        print("开启线程：" + self.name)
        process_data(self.name, self.q)
        print("退出线程：" + self.name)

def process_data(threadName, q):
    while not exitFlag:
        queueLock.acquire()
        if not workQueue.empty():
            data = q.get()
            queueLock.release()
            print("%s processing %s" % (threadName, data))
        else:
            queueLock.release()
        time.sleep(1)

threadList = ["Thread-1", "Thread-2", "Thread-3"]
nameList = ["One", "Two", "Three", "Four", "Five"]
queueLock = threading.Lock()
workQueue = queue.Queue(10)
threads = []

for tName in threadList:
    thread = myThread(threadID, tName, workQueue)
    thread.start()
    threads.append(thread)
    threadID += 1

# 填充任务队列
queueLock.acquire()
for word in nameList:
    workQueue.put(word)
queueLock.release()

while not workQueue.empty():
    pass

exitFlag = 1  # 通知线程退出

for t in threads:
    t.join()
print("退出主线程")

适用场景：

任务量不确定的生产者-消费者模型。
需要控制并发数量的爬虫系统。

八、常见问题与避坑指南

问题1：`Unhandled exception in thread started by`

原因：主线程提前结束，导致子线程被强制终止。

解决方案：确保主线程等待所有子线程完成。

python 复制代码

# 方法一：使用join()
thread1.join()
thread2.join()

# 方法二：保持主线程运行
while 1:
    time.sleep(1)

问题2：GIL限制计算密集型任务

Python的全局解释器锁（GIL）导致多线程无法并行执行CPU密集型代码。此时应使用multiprocessing模块。

问题3：死锁

多个线程相互等待对方释放资源时发生。

预防：使用threading.RLock（可重入锁）或with语句管理锁。

python 复制代码

lock = threading.Lock()
with lock:
    # 自动获取和释放锁
    critical_section()

九、性能对比与选型建议

场景	推荐方案	理由
I/O密集型（网络爬虫）	`threading` + 队列	线程切换开销低，并发效果好
CPU密集型（计算）	`multiprocessing`	绕过GIL，利用多核
高并发异步任务	`asyncio`	单线程协程，更轻量级
简单后台任务	`_thread` 或 `threading`	快速实现

十、总结

模块选择 ：Python3中应优先使用threading模块。
线程安全：共享资源需加锁，避免数据竞争。
主线程管理：必须确保主线程等待子线程结束，否则会报错。
任务队列 ：queue.Queue结合多线程是生产者-消费者模式的最佳实践。
异常处理：每个线程内部需独立捕获异常，防止影响其他线程。

多线程是提升程序效率的利器，但需注意其适用场景。在I/O密集型任务中，多线程能显著提升吞吐量；而在计算密集型任务中，多进程或许是更好的选择。

参考链接：

python积累--多线程的使用实例