前言:
众所周知在开发的过程中,数据一直是推动整个业务链条的重要一环,通过爬虫进行数据的爬取和更新也是日常的操作,目前支持爬虫的语言很多:Python、Java、Ruby 还有Nodejs ,也就是今天主角Puppeteer, 它是由 Google Chrome 官方团队维护以Node.js 为基础的开源工具,主要用于控制和自动化谷歌浏览器(Google Chrome)或其他兼容的浏览器操作。
废话不多说,下面让我们从浅到深一步一步带领大家走进爬虫的世界~
puppeteer的简介
Puppeteer是一个由Google开发的Node.js库,它提供了一套用于控制headless Chrome或Chromium浏览器的API。它可以模拟用户在浏览器中的操作行为,如点击、填写表单、截图等,同时还可以让开发者获取到浏览器渲染后的HTML内容。它提供了一套高级的 API,使得浏览器操作变得简单和可靠。主要包括:自动化控制、页面操控、网络请求拦截、页面截图和 PDF 生成、自动化测试等一系列操作
总而言之,Puppeteer 是一个功能强大、易用且灵活的浏览器自动化工具,能够帮助开发者完成各种浏览器操作和自动化任务。
环境搭建
puppeteer从v1.7.0 开始支持两个包:puppeteer、puppeteer-core,
- puppeteer: 一个完整的包会下载一个可执行的Chromium浏览器。整个体积很大(适合本地调试)
- puppeteer-core: 不会下载一个可执行的Chromium浏览器、体积很小、配置的浏览器需要自己手动更新(适合部署在生产环境)
支持的版本Node版本 >= v16.20.0
js
npm i puppeteer or npm i puppeteer -g // 最新版本:V21.7.0
Puppeteer的基础API
使用Headless模式
Puppeteer默认启动的是无头模式进行开发, 可以通过headless
进行配置关闭,本地调试建议开启,
js
const browser = await puppeteer.launch();
// Equivalent to
const browser = await puppeteer.launch({headless: false}); // 本地调试
需要注意的是Chrome 112 推出了新的 Headless 模式,可以通过新的参数调整
js
const browser = await puppeteer.launch({headless: 'new'});
使用Puppeteer-core
在生产环境部署的时候使用puppeteer-core要注意版本,目测在v16.2.0 这个版本是没问题 ,最新v21.7.0在部署线上的时候有点问题
js
// const puppeteer = require("puppeteer");
const puppeteer = require("puppeteer-core");
const browser = await puppeteer.launch({
// executablePath: "/usr/bin/google-chrome", // 生产环境
executablePath:
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome", // [本地路径]
});
return browser;
关于executablePath 如何可以访问:chrome://version/
「可执行文件路径进行查看」
设置浏览器实例的其他命令行参数
可以设置args来执行当前运行的浏览器实例一些命令行加以限制,具体可以参考Chromium命令行开关列表
js
const puppeteer = require("puppeteer-core");
const browser = await puppeteer.launch({
executablePath:
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome", // [本地路径]
args: [
"--no-sandbox", // 使用沙盒模式
"--disable-setuid-sandbox", // 禁用setuid沙盒(仅限Linux)
"--disable-extensions", // 禁用扩展
"--incognito", // 禁用GPU硬件加速
"--disable-gpu", // 以隐身模式运行
"--no-zygote", // 禁用 Zygote 进程模型,启动时不创建一个共享的子进程来提高性能。
],
});
return browser;
设置浏览器视口分辨率
可以通过defaultViewport进行PC端的设置默认的视口分辨率
js
const puppeteer = require("puppeteer-core");
const browser = await puppeteer.launch({
executablePath:
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome", // [本地路径]
defaultViewport: {
height: 1080,
width: 1920,
},
});
return browser;
指定移动端设备访问
js
const puppeteer = require("puppeteer");
const iPhone = puppeteer.devices["iPhone 6"];
(async () => {
const browser = await puppeteer.launch({
headless: false,
});
await page.emulate(iPhone);
})();
其他
如果使用Docker部署可以参考相关资源
简单case上手
介绍完了以上的基础的API下面通过三个小例子来看一下它是如何工作的。
模拟设备截图
通过puppeteer模拟iPhone6进行访问百度的域名,进行当前网页的截图
js
const puppeteer = require("puppeteer");
const iPhone = puppeteer.devices["iPhone 6"];
(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.emulate(iPhone);
await page.goto("https://baidu.com/");
await page.screenshot({
path: "full.png",
fullPage: true,
});
console.log(await page.title());
await browser.close();
})();
使用用户搜索截图
通过puppeteer进行百度搜索Puppeteer,进行截图保存到本地
js
// baidu search
const puppeteer = require("puppeteer");
const screenshot = "baidu.png";
try {
(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.goto("https://baidu.com");
await page.type("#kw", "puppeteer");
await page.click("#su");
await page.waitForTimeout(2000);
await page.screenshot({ path: screenshot });
await browser.close();
})();
} catch (err) {
console.error(err);
}
设置cookie
通过puppeteer进行打开paypal进行cookie的种植,达到用户名的渲染
js
// set cookie
const cookie = {
name: "login_email",
value: "set_by_cookie@domain.com",
domain: ".paypal.com",
url: "https://www.paypal.com/",
path: "/",
httpOnly: true,
secure: true,
};
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.setCookie(cookie);
await page.goto("https://www.paypal.com/signin");
await page.screenshot({ path: "paypal_login.png" });
await browser.close();
})();
实战案例
相信通过以上的简单的示例分析,大家对整个流程有了一个初步的认识。下面让围绕目前主流爬取的方式来逐一攻破它的工作原理
解析HTML
下面以codashop 为例,通过解析HTML的方式把相关DOM节点元素进行筛选和过滤,抽离出SKU(商品)的「价格、商品名称」等数据
实例代码如下:
js
// codashop
const puppeteer = require("puppeteer");
const ua =
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5410.0 Safari/537.36";
const config = {
url: "https://www.codashop.com/en-my/pubg-mobile-uc-redeem-code",
gameName: "pubgm",
currency: "RM",
country: "my",
thous_separator: ",",
decimal_point_separator: ".",
};
// 延时
const waitFor = async (t) => {
return new Promise((r) => setTimeout(r, t));
};
try {
const run = async () => {
const roots = ".form-section__denom-group";
const browser = await puppeteer.launch({
headless: false,
defaultViewport: {
height: 1080,
width: 1920,
},
args: ["--no-sandbox"],
});
const page = await browser.newPage();
// 设置页面默认超时时间
page.setDefaultTimeout(100000);
// 设置页面的默认导航超时时间
page.setDefaultNavigationTimeout(50000);
// 设置user-agent
ua && (await page.setUserAgent(ua));
await page.goto(config.url, { waitUntil: "domcontentloaded" });
const section__denom = await page.waitForSelector(roots);
if (!section__denom) return [];
const params = { ...config, platform: "codashop" };
const _waitFor = waitFor.toString();
// 进行DOM操作
const jsons = await page.evaluate(
async (args, _waitFor, _roots) => {
const _wait = eval("(" + _waitFor + ")");
await _wait(1000);
let price, sku_name;
let _lis =
Array.from(
document.querySelectorAll(".form-section__denom-group li")
) || [];
if (_lis && _lis.length === 0) return [];
const games = _lis.map((item) => {
const sku_name_dom =
item.querySelector(".form-section__denom-data-section") || null;
const sku_price_dom =
item.querySelector(".starting-price-value") || null;
if (sku_name_dom) {
sku_name = sku_name_dom.innerText || "SKU_NAME";
}
if (sku_price_dom) {
price = sku_price_dom.innerText;
}
return {
price,
sku_name,
currency: args.currency,
platform: args.platform,
game: args.gameName,
country: args.country,
};
});
return !!(games && games.length) ? games : [];
},
params,
_waitFor,
roots
);
console.log(jsons);
/**
* [
{
price: 'RM4.50',
sku_name: '60 UC',
currency: 'RM',
platform: 'codashop',
game: 'pubgm',
country: 'my'
},
{
price: 'RM22.50',
sku_name: '325 UC',
currency: 'RM',
platform: 'codashop',
game: 'pubgm',
country: 'my'
},
{
price: 'RM45.00',
sku_name: '660 UC',
currency: 'RM',
platform: 'codashop',
game: 'pubgm',
country: 'my'
},
{
price: 'RM112.50',
sku_name: '1800 UC',
currency: 'RM',
platform: 'codashop',
game: 'pubgm',
country: 'my'
},
{
price: 'RM225.00',
sku_name: '3850 UC',
currency: 'RM',
platform: 'codashop',
game: 'pubgm',
country: 'my'
},
{
price: 'RM450.00',
sku_name: '8100 UC',
currency: 'RM',
platform: 'codashop',
game: 'pubgm',
country: 'my'
}
]
*/
await browser.close();
};
run();
} catch (err) {
console.error(err);
}
解析SSR渲染数据
以jollymax为例,通过查看当前的源码可以得到两个信息:使用的框架和是否为SSR渲染从而定位到数据的位置,下面是使用nuxtjs的SSR渲染,如图所示:
实例代码如下:
js
// jollymax
const puppeteer = require("puppeteer");
const ua =
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5410.0 Safari/537.36";
const config = {
url: "https://www.jollymax.com/ru/PUBG",
gameName: "pubgm",
currency: "RUB",
country: "ru",
thous_separator: "", // 千位分隔符
decimal_point_separator: ".", // 小数分隔符
};
try {
const run = async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: {
height: 1080,
width: 1920,
},
args: ["--no-sandbox"],
});
// Create a page
const page = await browser.newPage();
// 设置页面默认超时时间
page.setDefaultTimeout(100000);
// 设置页面的默认导航超时时间
page.setDefaultNavigationTimeout(50000);
// 设置user-agent
ua && (await page.setUserAgent(ua));
// 拦截请求
await page.setRequestInterception(true);
page.on("request", async (request) => {
// 对一些不必要的资源、进行终止增加加载速度
if (
request.resourceType() == "image" ||
request.resourceType() == "font" ||
request.resourceType() == "stylesheet"
) {
await request.abort();
} else {
await request.continue();
}
});
await page.goto(config.url, { waitUntil: "domcontentloaded" });
// 等待整个DOM加载完成
await page.waitForSelector(".content-right-part");
const params = { ...config, platform: "jollymax" };
const result = await page.evaluate(async (args) => {
let filterResults = [];
if (
window &&
window.__NUXT__ &&
window.__NUXT__.data &&
window.__NUXT__.data.length
) {
const _serverData = window.__NUXT__.data[0]?.serverData;
if ("pageData" in _serverData) {
const glist = _serverData.pageData.pageInfo.goodsList;
if (!(glist && glist.length)) return filterResults;
const getPrice = (item) => {
let result = "0";
if (item.payTypeList.length) {
result = item.payTypeList[0].amount;
}
return result.toString();
};
// 默认拿取第一个支付通道的价格
return glist.map((item) => {
const price = getPrice(item);
return {
currency: item.currency || args.currency,
platform: args.platform,
game: args.gameName,
country: args.country,
price,
sku_name: item?.goodsName || "SKU_NAME",
};
});
}
}
return [];
}, params);
console.log(result);
/**
* [
{
currency: 'RUB',
platform: 'jollymax',
game: 'pubgm',
country: 'ru',
price: '91',
sku_name: '60 UC'
},
{
currency: 'RUB',
platform: 'jollymax',
game: 'pubgm',
country: 'ru',
price: '440',
sku_name: '325 UC'
},
{
currency: 'RUB',
platform: 'jollymax',
game: 'pubgm',
country: 'ru',
price: '910',
sku_name: '660 UC'
},
{
currency: 'RUB',
platform: 'jollymax',
game: 'pubgm',
country: 'ru',
price: '2248',
sku_name: '1800 UC'
},
{
currency: 'RUB',
platform: 'jollymax',
game: 'pubgm',
country: 'ru',
price: '4500',
sku_name: '3850 UC'
},
{
currency: 'RUB',
platform: 'jollymax',
game: 'pubgm',
country: 'ru',
price: '9100',
sku_name: '8100 UC'
},
{
currency: 'RUB',
platform: 'jollymax',
game: 'pubgm',
country: 'ru',
price: '1442',
sku_name: 'RP Upgrade Pack-A3'
},
{
currency: 'RUB',
platform: 'jollymax',
game: 'pubgm',
country: 'ru',
price: '3608',
sku_name: 'Elite RP Upgrade Pack-A3'
}
]
*/
await browser.close();
};
run();
} catch (err) {
console.error(err);
}
HTTP劫持\请求
以razer为例,在请求中找到渲染当前页面的关系,通过拦截当前游戏名称的请求进行数据分析,获取当前的商品名称和价格等信息。
实例代码如下:
js
// razer
const puppeteer = require("puppeteer");
const ua =
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5410.0 Safari/537.36";
const config = {
url: "https://gold.razer.com/my/en/gold/catalog/pubgm",
gameName: "pubgm",
currency: "RM",
country: "my",
thous_separator: ",",
decimal_point_separator: ".",
};
try {
const run = async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: {
height: 1080,
width: 1920,
},
args: ["--no-sandbox"],
});
const page = await browser.newPage();
// 设置页面默认超时时间
page.setDefaultTimeout(100000);
// 设置页面的默认导航超时时间
page.setDefaultNavigationTimeout(50000);
// 设置user-agent
ua && (await page.setUserAgent(ua));
// 拦截请求
await page.setRequestInterception(true);
page.on("request", async (request) => {
// 对一些不必要的资源、进行终止增加加载速度
if (
request.resourceType() == "image" ||
request.resourceType() == "font"
) {
await request.abort();
} else {
await request.continue();
}
});
function getResValue() {
return new Promise((resolve) => {
let result = [];
page.on("response", async (response) => {
const url = response.url();
const headers = response.headers();
const contentType = headers["content-type"];
const _url =
url && url.indexOf("/") !== -1 ? url.split("/").pop() : "";
if (_url && contentType.includes("application/json")) {
const jsons = await response.json();
if (jsons && jsons.gameSkus && jsons.gameSkus.length) {
const _gameSkus = jsons.gameSkus || [];
result = _gameSkus.map((item) => {
const price = item.unitGold || item.unitBaseGold || 0;
const sku_name =
item.productName || item.vanityName || "SKU_NAME";
return {
currency: config.currency,
country: config.country,
platform: "razer",
game: _url,
price: price.toString(),
sku_name,
};
});
resolve(result);
}
}
});
});
}
await page.goto(config.url);
const result = await getResValue();
console.log(result);
/**
* [
{
currency: 'RM',
country: 'my',
platform: 'razer',
game: 'pubgm',
price: '5',
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM5)'
},
{
currency: 'RM',
country: 'my',
platform: 'razer',
game: 'pubgm',
price: '10',
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM10)'
},
{
currency: 'RM',
country: 'my',
platform: 'razer',
game: 'pubgm',
price: '20',
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM20)'
},
{
currency: 'RM',
country: 'my',
platform: 'razer',
game: 'pubgm',
price: '30',
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM30)'
},
{
currency: 'RM',
country: 'my',
platform: 'razer',
game: 'pubgm',
price: '40',
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM40)'
},
{
currency: 'RM',
country: 'my',
platform: 'razer',
game: 'pubgm',
price: '50',
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM50)'
},
{
currency: 'RM',
country: 'my',
platform: 'razer',
game: 'pubgm',
price: '100',
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM100)'
},
{
currency: 'RM',
country: 'my',
platform: 'razer',
game: 'pubgm',
price: '200',
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM200)'
},
{
currency: 'RM',
country: 'my',
platform: 'razer',
game: 'pubgm',
price: '300',
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM300)'
}
]
*/
await browser.close();
};
run();
} catch (err) {
console.error(err);
}
模拟用户点击
以razer为例,找到网页的商品的锚点的DOM元素进行模拟点击操作,根据不同商品请求对应的价格的通道数据
实例代码如下:
js
// razer 模拟用户点击
const puppeteer = require("puppeteer");
const ua =
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5410.0 Safari/537.36";
const config = {
url: "https://gold.razer.com/my/en/gold/catalog/pubgm",
gameName: "pubgm",
currency: "RM",
country: "my",
thous_separator: ",",
decimal_point_separator: ".",
};
const waitFor = async (t) => {
return new Promise((r) => setTimeout(r, t));
};
const gameSkuList = [];
try {
const run = async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: {
height: 1080,
width: 1920,
},
args: ["--no-sandbox"],
});
const page = await browser.newPage();
// 设置页面默认超时时间
page.setDefaultTimeout(100000);
// 设置页面的默认导航超时时间
page.setDefaultNavigationTimeout(50000);
// 设置user-agent
ua && (await page.setUserAgent(ua));
await page.goto(config.url);
await waitFor(3000);
const webshopStepSku = await page.waitForSelector("#webshop_step_sku");
if (!webshopStepSku) {
throw new Error("当前的IP被封禁了!!!");
}
const skuItem = await page.$$("#webshop_step_sku .sku-list__item");
const darkFilter = await page.$(".onetrust-pc-dark-filter");
// 自定义弹窗默认关闭
await page.evaluateHandle((element) => {
element && (element.style.display = "none");
}, darkFilter);
const params = { ...config };
const getCards = async (dList, args) => {
for (let d of dList) {
const sku_name = await page.evaluate((element) => {
const res = element.querySelector(".selection-tile__text") || null;
if (!res) return {};
return res?.innerText || "";
}, d);
await d.click();
await waitFor(1500);
const price_text = await page.evaluate(() => {
const channels =
document.querySelector("#webshop_step_payment_channels") || null;
if (!channels) return {};
// 优先获取其他支付通道
let _details =
channels.querySelectorAll(".selection-tile-promos__details")[1] ||
null;
// 兜底钱包
if (!_details) {
_details =
channels.querySelectorAll(".selection-tile-promos__details")[0] ||
null;
if (!_details) return {};
}
const _card =
_details.querySelector(".align-self-center.text-right") || null;
if (!_card) return {};
return _card?.innerText || "0";
});
const jons = {
sku_name,
price: price_text,
currency: args.currency,
platform: "pubgm",
game: args.gameName,
country: args.country,
};
gameSkuList.push(jons);
}
return gameSkuList;
};
const result = await getCards(skuItem, params);
console.log(result);
/**
* [
{
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM5)',
price: 'RM 5.00',
currency: 'RM',
platform: 'pubgm',
game: 'pubgm',
country: 'my'
},
{
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM10)',
price: 'RM 10.00',
currency: 'RM',
platform: 'pubgm',
game: 'pubgm',
country: 'my'
},
{
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM20)',
price: 'RM 20.00',
currency: 'RM',
platform: 'pubgm',
game: 'pubgm',
country: 'my'
},
{
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM30)',
price: 'RM 30.00',
currency: 'RM',
platform: 'pubgm',
game: 'pubgm',
country: 'my'
},
{
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM40)',
price: 'RM 40.00',
currency: 'RM',
platform: 'pubgm',
game: 'pubgm',
country: 'my'
},
{
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM50)',
price: 'RM 50.00',
currency: 'RM',
platform: 'pubgm',
game: 'pubgm',
country: 'my'
},
{
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM100)',
price: 'RM 100.00',
currency: 'RM',
platform: 'pubgm',
game: 'pubgm',
country: 'my'
},
{
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM200)',
price: 'RM 200.00',
currency: 'RM',
platform: 'pubgm',
game: 'pubgm',
country: 'my'
},
{
sku_name: 'Razer Gold Direct Top-Up PIN (MY) - (RM300)',
price: 'RM 300.00',
currency: 'RM',
platform: 'pubgm',
game: 'pubgm',
country: 'my'
}
*/
await browser.close();
};
run();
} catch (err) {
console.error(err);
}
高级应用
在实际情况中不同的网站都有一些不同或者说特殊的场景,比如:如何爬取多个页面?绕过验证码校验?破解机器人检测等,下面就让我们解锁Puppeteer更强大的功能!
绕过机器检测
我们可以通过检测机器人的网址进行测试,左真实的用户右侧是puppeteer访问,可以明显的看出在右侧的WebDriver标记为红色;Tips: 不同的浏览器可能表现不一致
我们可以使用到插件puppeteer-extra-plugin-stealth,它属于puppeteer-extra 全家桶的一个,访问右图片就明显看到没有报错了。
js
// 绕过爬虫检测
const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin());
(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.goto("https://bot.sannysoft.com/");
await browser.close();
})();
绕过验证码检测
对一些网站的验证码的校验,例如下图的google的人机验证,其实可以借助puppeteer-extra-plugin-recaptcha 进行破解处理来完成后续数据的操作,实例代码如下:
Tips: 知识需要付费哦
实例代码如下:
js
const puppeteer = require("puppeteer-extra");
const RecaptchaPlugin = require("puppeteer-extra-plugin-recaptcha");
puppeteer.use(
RecaptchaPlugin({
provider: {
id: "2captcha",
token: "xxxxx", // 知识需要付费
},
visualFeedback: true,
})
);
const waitFor = async (t) => {
return new Promise((r) => setTimeout(r, t));
};
puppeteer.launch({ headless: false }).then(async (browser) => {
const page = await browser.newPage();
await page.goto("https://www.google.com/recaptcha/api2/demo");
await page.solveRecaptchas();
await Promise.all([
page.waitForNavigation(),
page.click(`#recaptcha-demo-submit`),
]);
await page.screenshot({ path: 'response.png', fullPage: true })
await browser.close()
});
开始多进程
很多场景我们会同时爬取多个网址,为了在性能上得到保证可以采用puppeteer-cluster来管理多个线程进行不同网站的处理,降低性能的损耗
实例代码如下:
js
const { Cluster } = require("puppeteer-cluster");
(async () => {
// Create a cluster with 2 workers
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 3,
puppeteerOptions: {
headless: false,
},
});
// Define a task (in this case: screenshot of page)
await cluster.task(async ({ page, data: url }) => {
await page.goto(url);
const path = url.replace(/[^a-zA-Z]/g, "_") + ".png";
await page.screenshot({ path });
console.log(`Screenshot of ${url} saved: ${path}`);
});
// Add some pages to queue
cluster.queue("https://www.baidu.com");
cluster.queue("https://www.bing.com/?mkt=zh-CN");
cluster.queue("https://github.com/");
// Shutdown after everything is done
await cluster.idle();
await cluster.close();
})();
总结
puppeteer可以帮助我们完成一些自动化操作的同时也要注意他的优缺点,在进行一些内存消耗较大的任务的时候会导致占用的内存特别高,同时要启动一个真实的Chrome实例 会对一些需要快速执行的应用造成影响。
总体来说,Puppeteer是一个功能强大且易于使用的浏览器自动化工具,适用于各种场景。然而,在选择是否使用Puppeteer时,需要考虑到其对系统资源的消耗和启动时间较慢这两个缺点。