用户代理池的作用就是模拟不同用户请求,
防止被屏蔽。
用户代理池:
这个池子也很简单,
就是多准备一些 ua 头就好了
废话不多说,先来简单的看看
import urllib.request
import random
uapool = [
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 OPR/26.0.1656.60",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36",
"Mozilla/5.0 (Windows NT 5.1; U; en; rv:1.8.1) Gecko/20061208 Firefox/2.0.0 Opera 9.50",
"Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2"
]
def ua(uapool):
headers = random.choice(uapool) # 随机选取一个
print(headers)
if __name__ == '__main__':
ua(uapool)
演示随机选取,
可以封装一个函数,这样要用的时候就调用一次就好了。
接下来我们再把请求写进去
import urllib.request
import random
uapool = [
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 OPR/26.0.1656.60",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36",
"Mozilla/5.0 (Windows NT 5.1; U; en; rv:1.8.1) Gecko/20061208 Firefox/2.0.0 Opera 9.50",
"Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2"
]
def ua(uapool):
headers = random.choice(uapool) # 随机选取一个
opener = urllib.request.build_opener()
opener.addheaders = ["User-Agent",headers]
urllib.request.install_opener(opener)
这样就封装好了一个请求
为什么前面要加 User-Agent 呢?
这就是头部标识,相当于 key -> value 的关系
接下来我们可以 拿 糗事百科 那个代码来演示一下
import urllib.request
import random
uapool = [
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 OPR/26.0.1656.60",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36",
"Mozilla/5.0 (Windows NT 5.1; U; en; rv:1.8.1) Gecko/20061208 Firefox/2.0.0 Opera 9.50",
"Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2"
]
def ua(uapool):
headers = random.choice(uapool) # 随机选取一个
opener = urllib.request.build_opener()
opener.addheaders = ["User-Agent",headers]
urllib.request.install_opener(opener)
for i in range(1,5):
ua(uapool)
this_url = "https://www.qiushibaike.com/text/page/"+str(i)+"/"
data = urllib.request.urlopen(this_url).read().decode("utf-8","ignore")
path = '<div class="content">.*?<span>(.*?)</span>.*?</div>'
resut = re.compile(path,re.S).findall(data)
for j in resut:
print(j)
time.sleep(0.5)
这样每次循环的时候他都会选取一个 ua 代理