这节课我们来谈谈 Scrapy

说到Python爬虫，大牛们都会不约而同地提起Scrapy。因为Scrapy是一个为了爬取网站数据，提取结构性数据而编写的应用框架。可以应用在包括数据挖掘，信息处理或存储历史数据等一系列的程序中。

Scrapy最初是为了页面抓取（更确切来说, 网络抓取）所设计的，也可以应用在获取API所返回的数据（例如Amazon Associates Web Services）或者通用的网络爬虫。

这节课主要讲解如何安装 Scrapy

使用的电脑系统：Windows 10 64位
使用的Python的版本：python 3.5.2

step1：安装Python 3.5.2

直接下载安装包安装即可，在安装时，必须勾选 Add Python 3.5 to PATH，这是将 python 添加到 PATH 环境变量。

如果在安装时没有勾选，也没关系，打开"运行"，输入cmd。执行以下命令，设置环境变量：

C:\Python35\python.exe C:\Python35\tools\Scripts\win_add2path.py

step2 ：确认Python 安装正确

重新打开cmd，输入命令"python --version"

--如果有显示Python2.7.9则说明成功；

--如果没有，请服用Windows特效药：重启系统尝试一下。

step3：安装pywin32（32位版本）

这里有两种办法安装：

法1：进入网页http://sourceforge.net/projects/pywin32/，下载pywin32，双击安装。

法2：打开cmd，输入命令 pip install pywin32，等待安装成功即可。

如果不能成功的话，可能就是没有安装 pip（如果你是使用Python2.9 以前的版本，是需要自己另外安装pip的，步骤如下:）

下面的内容针对没有 pip 的用户，其他的人可以直接看step4。

•安装pip，地址：

--pip · PyPI

a)下载get-pip.py

b)进入cmd，执行：python get-pip.py

c)检查Python27\Scripts中是否有pip.exe并设置Python27\Scripts到环境变量中

d)重启cmd，输入命令"pip --version"

•如果有显示版本号则说明成功；

•如果没有，请继续服用Windows特效药：重启系统尝试一下。

step4：安装 lxml

这里也可以使用两种办法安装：

法1：进入网页Installing lxml，下载 lxml，双击安装。

法2：打开cmd，输入命令 pip install lxml，等待安装成功即可。

step5：安装OpenSSL

法1：pypi.python.org/pypi/pyOpenSSL

法2：pip install pyOpenSSL

step6：安装Scrapy

pip install Scrapy

这样就安装 OK了，我们班来验证一下：

重新打开cmd，输入命令 Scrapy

C:\Users\XiangyangDai>Scrapy

:0: UserWarning: You do not have a working installation of the service_identity module: 'cannot import name 'opentype''. Please install it from <https://pypi.python.org/pypi/service_identity\> and make sure all of its dependencies are satisfied. Without the service_identity module, Twisted can perform only rudimentary TLS client hostname verification. Many valid certificate/hostname mappings may be rejected.

Scrapy 1.5.1 - no active project

Usage:

scrapy <command> $options$ $args$

Available commands:

bench Run quick benchmark test

fetch Fetch a URL using the Scrapy downloader

genspider Generate new spider using pre-defined templates

runspider Run a self-contained spider (without creating a project)

settings Get settings values

shell Interactive scraping console

startproject Create new project

version Print Scrapy version

view Open URL in browser, as seen by Scrapy

$more$ More commands available when run from project directory

Use "scrapy <command> -h" to see more info about a command

没问题了，但是这里出现了一个 warning，说的是 service_identity 模块里的 opentype 用不了，这就尴尬了，我们尝试重新安装 service_identity 试一下：

先卸载 service_identity

输入：pip uninstall service_identity

等待卸载完成，再安装 service_identity

输入：pip install service_identity

再来验证一下：输入 Scrapy

完美了。

《零基础入门学习Python》第062讲：论一只爬虫的自我修养10：安装Scrapy