Selenium竞品价格监控爬虫（代理防封版）

在电商竞争白热化的今天，实时掌握对手定价策略成为制胜关键。本代码基于Selenium构建了一套智能价格监控系统，专为应对动态渲染网站的反爬机制而生。通过集成代理IP轮换、UA伪装和无头浏览器技术，突破电商平台封锁，实现主流平台的价格精准抓取。系统自动记录时间戳并存储至CSV，为企业提供竞品价格波动的一手数据，助力快速决策。

以下是一个使用Selenium实现竞争对手价格监控并植入代理IP防封的Python代码示例。代码包含随机等待、User-Agent轮换等防封策略：

python 复制代码

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import random
import time
import csv
from datetime import datetime

# 配置参数
PROXY_LIST = [
    "http://username:password@45.76.102.33:8080",  # 需替换为有效代理
    "http://username:password@138.197.222.35:3128",
    "http://username:password@209.97.150.167:3128"
]
USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/605.1.15",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"
]
TARGET_URLS = [
    "https://www.amazon.com/dp/B08N5WRWNW",  # 竞品1
    "https://www.bestbuy.com/site/sony-wh-1000xm4/6423324.p",  # 竞品2
    "https://www.walmart.com/ip/Sony-WH-1000XM4/123456789"   # 竞品3
]
OUTPUT_FILE = "price_monitor.csv"

def setup_driver():
    """配置带代理和随机User-Agent的浏览器"""
    chrome_options = Options()
    
    # 随机选择代理和User-Agent
    proxy = random.choice(PROXY_LIST)
    user_agent = random.choice(USER_AGENTS)
    
    # 设置代理
    chrome_options.add_argument(f'--proxy-server={proxy}')
    chrome_options.add_argument(f'user-agent={user_agent}')
    
    # 无头模式 + 其他防检测设置
    chrome_options.add_argument("--headless=new")
    chrome_options.add_argument("--disable-blink-features=AutomationControlled")
    chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
    chrome_options.add_experimental_option('useAutomationExtension', False)
    
    # 初始化驱动 (需下载对应版本的chromedriver)
    driver = webdriver.Chrome(
        service=Service(executable_path='chromedriver'),  # 替换为你的chromedriver路径
        options=chrome_options
    )
    
    # 隐藏自动化特征
    driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
    
    return driver

def extract_price(driver, url):
    """从不同网站提取价格数据"""
    try:
        driver.get(url)
        # 随机等待(3-8秒)模拟人类行为
        time.sleep(random.uniform(3, 8))
        
        # 根据网站类型使用不同选择器
        if "amazon" in url:
            price_element = WebDriverWait(driver, 15).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, "span.a-price .a-offscreen"))
            )
            price = price_element.get_attribute("textContent").replace("$", "").strip()
            
        elif "bestbuy" in url:
            price_element = WebDriverWait(driver, 15).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, "div.priceView-customer-price > span"))
            )
            price = price_element.text.replace("$", "").strip()
            
        elif "walmart" in url:
            price_element = WebDriverWait(driver, 15).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, "span[itemprop='price']"))
            )
            price = price_element.get_attribute("content")
            
        else:
            price = "N/A"
            
        return float(price) if price.replace('.', '', 1).isdigit() else None
        
    except Exception as e:
        print(f"提取价格失败: {str(e)}")
        return None

def save_to_csv(data):
    """保存结果到CSV文件"""
    with open(OUTPUT_FILE, 'a', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        # 如果是新文件则写入表头
        if f.tell() == 0:
            writer.writerow(['Timestamp', 'Competitor', 'Product', 'Price', 'URL'])
        writer.writerow(data)

def monitor_prices():
    """执行价格监控主逻辑"""
    driver = setup_driver()
    
    try:
        for url in TARGET_URLS:
            # 随机等待间隔(避免规律性请求)
            time.sleep(random.randint(2, 5))
            
            # 获取价格
            price = extract_price(driver, url)
            if price is None:
                continue
                
            # 解析网站名称和产品
            competitor = url.split('.')[1]  # 从URL提取网站名称
            product = url.split('/')[-1]   # 获取产品ID
            
            # 保存结果
            timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
            save_to_csv([timestamp, competitor, product, price, url])
            print(f"[{timestamp}] {competitor} 价格更新: ${price}")
            
    finally:
        driver.quit()  # 确保浏览器关闭

if __name__ == "__main__":
    monitor_prices()
    print("价格监控任务完成！数据已保存至", OUTPUT_FILE)

关键功能说明：

1、代理IP集成

自动轮换多个代理IP（支持认证代理）
通过--proxy-server参数配置

2、防封策略

随机User-Agent轮换
无头模式 + 禁用自动化特征
随机等待时间（3-8秒）
隐藏Webdriver属性
请求间隔随机化（2-5秒）

3、价格监控逻辑

针对不同网站定制选择器
显式等待确保元素加载
价格数据清洗处理
时间戳记录

4、数据存储

CSV格式存储历史价格
自动生成带时间戳的记录

使用前准备：

1、安装依赖：

复制代码

pip install selenium webdriver-manager csv

2、配置调整：

替换PROXY_LIST为有效代理IP
更新TARGET_URLS为目标商品链接
根据目标网站调整CSS选择器
设置正确的chromedriver路径

3、推荐优化：

添加邮件/SMS价格异常警报
集成云数据库（MySQL/MongoDB）
添加自动重试机制
结合Scrapy实现分布式爬取

敲黑板：实际使用时需遵守目标网站robots.txt协议，建议设置合理爬取频率（如每天2-3次），避免对目标网站造成压力。商业用途请咨询法律意见。

本监控系统已完成核心功能验证，但商业环境使用需注意三点：优先选用住宅代理提高隐匿性，设置合理抓取频率（建议≥30分钟/次），并持续优化选择器应对页面改版。未来可扩展价格异常报警、历史趋势可视化功能。使用前请务必遵守目标网站robots协议，避免法律风险。技术创造价值，诚信守护商机。