开源的不需要写代码的爬虫maxun

转自github热门项目

GitHub - getmaxun/maxun: ?? Open-source no-code web data extraction platform. Turn websites to APIs and spreadsheets with no-code robots in minutes! [In Beta],更多详细信息见github。本文主要是讲一下自己部署遇到的一些小问题。可以直接看最后一节,获得博主专属个人经验。

Maxun Open-Source No-Code Web Data Extraction Platform

Local Setup

Docker Compose
复制代码
git clone https://github.com/getmaxun/maxun
docker-compose up -d --build
Without Docker
  1. Ensure you have Node.js, PostgreSQL, MinIO and Redis installed on your system.

  2. Run the commands below

    git clone https://github.com/getmaxun/maxun

    change directory to the project root

    cd maxun

    install dependencies

    npm install

    change directory to maxun-core to install dependencies

    cd maxun-core

    npm install

    start frontend and backend together

    npm run start

You can access the frontend at http://localhost:5173/ and backend at http://localhost:8080/

Environment Variables

  1. Create a file named .env in the root folder of the project
  2. Example env file can be viewed here.

Variable

Mandatory

Description

If Not Set

BACKEND_URL

Yes

URL to run backend on.

Backend won't start. If not sure, set to http://localhost:8080

VITE_BACKEND_URL

Yes

URL to run backend on.

Backend won't start. If not sure, set to http://localhost:8080

JWT_SECRET

Yes

Secret key used to sign and verify JSON Web Tokens (JWTs) for authentication.

JWT authentication will not work.

DB_NAME

Yes

Name of the Postgres database to connect to.

Database connection will fail.

DB_USER

Yes

Username for Postgres database authentication.

Database connection will fail.

DB_PASSWORD

Yes

Password for Postgres database authentication.

Database connection will fail.

DB_HOST

Yes

Host address where the Postgres database server is running.

Database connection will fail.

DB_PORT

Yes

Port number used to connect to the Postgres database server.

Database connection will fail.

ENCRYPTION_KEY

Yes

Key used for encrypting sensitive data (proxies, passwords).

Encryption functionality will not work.

MINIO_ENDPOINT

Yes

Endpoint URL for MinIO, to store Robot Run Screenshots.

Connection to MinIO storage will fail.

MINIO_PORT

Yes

Port number for MinIO service.

Connection to MinIO storage will fail.

MINIO_ACCESS_KEY

Yes

Access key for authenticating with MinIO.

MinIO authentication will fail.

GOOGLE_CLIENT_ID

No

Client ID for Google OAuth, used for Google Sheet integration authentication.

Google login will not work.

GOOGLE_CLIENT_SECRET

No

Client Secret for Google OAuth.

Google login will not work.

GOOGLE_REDIRECT_URI

No

Redirect URI for handling Google OAuth responses.

Google login will not work.

REDIS_HOST

Yes

Host address of the Redis server, used by BullMQ for scheduling robots.

Redis connection will fail.

REDIS_PORT

Yes

Port number for the Redis server.

Redis connection will fail.

MAXUN_TELEMETRY

No

Disables telemetry to stop sending anonymous usage data. Keeping it enabled helps us understand how the product is used and assess the impact of any new changes. Please keep it enabled.

Telemetry data will not be collected.

How Does It Work

Maxun lets you create custom robots which emulate user actions and extract data. A robot can perform any of the actions: Capture List, Capture Text or Capture Screenshot. Once a robot is created, it will keep extracting data for you without manual intervention

1. Robot Actions

  1. Capture List: Useful to extract structured and bulk items from the website. Example: Scrape products from Amazon etc.
  2. Capture Text: Useful to extract individual text content from the website.
  3. Capture Screenshot: Get fullpage or visible section screenshots of the website.

2. BYOP

BYOP (Bring Your Own Proxy) lets you connect external proxies to bypass anti-bot protection. Currently, the proxies are per user. Soon you'll be able to configure proxy per robot.

Features

  • Extract Data With No-Code
  • Handle Pagination & Scrolling
  • Run Robots On A Specific Schedule
  • Turn Websites to APIs
  • Turn Websites to Spreadsheets
  • Adapt To Website Layout Changes (coming soon)
  • Extract Behind Login, With Two-Factor Authentication Support (coming soon)
  • Integrations (currently Google Sheet)
  • +++ A lot of amazing things soon!

我遇到的问题

我是mac电脑,首先去docker官网下载了docker.desktop。然后我进入了docker.desktop里面执行

git clone GitHub - getmaxun/maxun: ?? Open-source no-code web data extraction platform. Turn websites to APIs and spreadsheets with no-code robots in minutes! [In Beta]和docker-compose up -d --build。这个时候maxun的代码是下载下来了,会报错,报错找不到env文件,所以我们需要自己去maxun代码的目录下去新建一个.env文件,里面的内容直接复制Example即可。

继续docker-compose up -d --build,依然会报错下载各种镜像网络不通,docker环境没有安全上网,然后我直接使用mac电脑的终端运行,因为电脑有安全上网就直接把所有镜像下载下来,并且直接构建了应用,可以直接使用了。

相关推荐
irpywp28 分钟前
合盖断网打断后台计算,Modafinil:一款防休眠菜单栏工具,让 Mac 闭眼继续跑 Agent
macos·ios·开源·github
扬帆破浪1 小时前
免费开源AI软件.桌面单机版,可移动的AI知识库,察元 AI桌面版:sidecar起不来怎么排查 62581端口被占的几种现实情况
开源
忧云3 小时前
开源 SSH 客户端 Netcatty:免费替代 Termius,带 AI 的现代化运维工具
运维·开源·ssh
tang777895 小时前
2026年国内代理IP服务商横向测评:企业级爬虫如何选型?
运维·服务器·网络·爬虫·python·代理
扬帆破浪6 小时前
免费开源AI软件.桌面单机版,可移动的AI知识库,察元 AI桌面版:Windows装包被防病毒拦了 看安装日志和签名链的实战
人工智能·windows·开源·知识图谱
十六年开源服务商6 小时前
2026年用开源CMS建站完整步骤指南
开源
扬帆破浪6 小时前
免费开源AI软件.桌面单机版,可移动的AI知识库,察元 AI桌面版:本地离线知识库的妥协与收益 老电脑跑察元AI的可行边界
人工智能·windows·开源·电脑·知识图谱
xmdy58667 小时前
Flutter+开源鸿蒙实战|校园易生活Day7 个人中心完善+我的发布/收藏+退出登录+主题切换+全局UI美化(项目闭环)
flutter·开源·harmonyos
一直会游泳的小猫7 小时前
UI-TARS-desktop
开源·字节跳动·bytedance·多模态gui agent
扬帆破浪7 小时前
免费开源AI软件.桌面单机版,可移动的AI知识库,察元 AI桌面版:本地离线知识库的真完全离线 内网无外网装察元AI的拼装步骤
人工智能·windows·开源·电脑·知识图谱