一、selenium的优势和点击操作

1.环境搭建

工具：Chrome浏览器+chromedriver+selenium

win用户：chromedriver.exe放在python.exe旁边

MacOS用户：驱动路径是/user/local/bin/chromedriver

Linux大佬自行安装

2.Selenium优势

Selenium直接操作浏览器，不需要分析请求和加密数据

程序可以读取网页源码，分析并提取内容

程序可以直接和网页元素进行交互，例如点击

python 复制代码

from selenium import webdriver
from time import sleep

url = 'http://shanzhi.spbeen.com/'
cb = webdriver.Chrome()
cb.get(url)
word_search_input = cb.find_element_by_xpath('.//input[@name="word"]')
word_search_input.send_keys("开发")
sleep(2)
search_button = cb.find_element_by_xpath('.//form[@action="/search/"]/button')
search_button.click() 
sleep(3)
num = 1
while num <= 5:
    next_element  = cb.find_element_by_xpath('.//div[@class="col-4"]/a[1]')
    next_element.click()
    sleep(3)
    num += 1
sleep(5)
cb.quit()

3.总结：

使用selenium,可以降低开发难度，提高开发效率

selenium可以直接操作页面元素，例如点击

selenium会降低程序运行速度，因为会主动加载更多的内容

二、chrome的远程调试能力

命令参数：--remote-debugging-port=9221

1.selenium端口调试的优势：

直接启动的浏览器，无selenium的特征，更安全

浏览器和selenium程序独立存在，不干扰

selenium依然可以控制chrome,程序上没有任何的修改

2.实践操作：selenium远程调试

Chrome开启远程调试端口

（1）windows用户

新建一个Chrome的快捷方式，然后鼠标右键，打开属性

（2）Mac (3)Linux

书写代码：

python 复制代码

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_experimental_option("debuggerAddress","127.0.0.1:9221")
cb = webdriver.Chrome(options=options)

print(cb.title)

cb.get("https://www.zhihu.com")

print(cb.title)

cb.quit()

总结：

开启Chrome的远程调试端口，独立运行更自由

Selenium代码启动后，直接接管Chrome,操作没区别

注意Chrome网页环境参数问题，操作前先处理环境

三、通过Chrome隔离实现一台电脑登陆多个账号

1.启动参数介绍

--remote-debugging-port=9221

--user-data-dir=='C:/path_to/data_dir'

--headless

--window-size=1336,768

--disable-infobars

--incognito 无痕模式

2.正常模式和无痕模式

正常模式，数据正常保存并可以二次读取

无痕模式也会将数据存储在本地，不会二次加载

Selenium可以手动指定数据存储目录，用于多账号的数据存储

3.实践操作：Chrome数据存储隔离操作

python 复制代码

from selenium import webdriver
from time import sleep
import os

path = '/Users/buladou/chrome_temp_dir'

user = [
    ['demo123','demo123'],
    ['demo1234','demo1234'],
    ['test123','test123'],
    ['test1234','test1234']
]

for user in users:
    options = webdriver.ChromeOptions()
    user_path = os.path.join(path, user[0])
    if not os.path.exists(user_path):
        os.makedirs(user_path)
    if 'demo' in user[0]:
        options.add_argument("--user-data-dir={}".format(user_path))

    else:
        options.add_argument("--incognito")
    options.add_argument("--user-data-dir={}".format(user_path))
    cb = webdriver.Chrome(options=options)
    cb.get('http://shanzhi.spbeen.com/login/')
    username = cb.find_element_by_xpath('.//input[@name="username"]')
    username.send_keys(user[0])
    password =cb.find_element_by_xpath('.//input[@id="MemberPassword"]')
    password.send_keys(user[1])
sleep(60*60)

总结：

需要隐私操作，使用无痕模式启动浏览器，更保险

指定目录启动selenium，数据可以进行二次加载，读取记录

启动系统浏览器，有一个默认的存储地址，而且是固定的

爬虫进阶-反爬破解5（selenium的优势和点击操作+chrome的远程调试能力+通过Chrome隔离实现一台电脑登陆多个账号）

一、selenium的优势和点击操作

二、chrome的远程调试能力

三、通过Chrome隔离实现一台电脑登陆多个账号