PHP + CSS + JS 数据采集与展示系统

PHP + CSS + JS 数据采集与展示系统

下面我将为你实现一个完整的系统,包含前端单页展示、后端数据采集,并按照要求分离文件结构,支持伪静态。

文件结构

arduino 复制代码
project/
├── assets/
│   ├── css/
│   │   └── style.css
│   └── js/
│       └── script.js
├── includes/
│   ├── config.php
│   ├── scraper.php
│   └── functions.php
├── index.php
└── .htaccess (用于伪静态)

1. 配置文件 (includes/config.php)

php 复制代码
<?php
// 数据库配置
define('DB_HOST', 'localhost');
define('DB_USER', 'root');
define('DB_PASS', 'password');
define('DB_NAME', 'scraping_db');

// 采集目标配置
define('TARGET_SITES', [
    'example' => [
        'url' => 'https://example.com/news',
        'selectors' => [
            'title' => 'h1.news-title',
            'content' => 'div.news-content',
            'date' => 'span.news-date'
        ]
    ],
    'another_site' => [
        'url' => 'https://another-site.com/articles',
        'selectors' => [
            'title' => 'article h2',
            'summary' => 'article p.summary',
            'author' => 'div.author-name'
        ]
    ]
]);

// 其他配置
define('ITEMS_PER_PAGE', 10);

2. 数据库函数 (includes/functions.php)

php 复制代码
<?php
require_once 'config.php';

class DB {
    private static $instance = null;
    private $connection;
    
    private function __construct() {
        $this->connection = new mysqli(DB_HOST, DB_USER, DB_PASS, DB_NAME);
        if ($this->connection->connect_error) {
            die("Connection failed: " . $this->connection->connect_error);
        }
        $this->connection->set_charset("utf8mb4");
    }
    
    public static function getInstance() {
        if (!self::$instance) {
            self::$instance = new DB();
        }
        return self::$instance;
    }
    
    public function getConnection() {
        return $this->connection;
    }
    
    public function query($sql, $params = []) {
        $stmt = $this->connection->prepare($sql);
        if (!empty($params)) {
            $types = str_repeat('s', count($params));
            $stmt->bind_param($types, ...$params);
        }
        $stmt->execute();
        return $stmt;
    }
    
    public function fetchAll($sql, $params = []) {
        $stmt = $this->query($sql, $params);
        $result = $stmt->get_result();
        return $result->fetch_all(MYSQLI_ASSOC);
    }
    
    public function fetchOne($sql, $params = []) {
        $stmt = $this->query($sql, $params);
        $result = $stmt->get_result();
        return $result->fetch_assoc();
    }
}

// 初始化数据库表
function initDatabase() {
    $db = DB::getInstance()->getConnection();
    
    $sql = "CREATE TABLE IF NOT EXISTS scraped_data (
        id INT AUTO_INCREMENT PRIMARY KEY,
        source_site VARCHAR(50) NOT NULL,
        title VARCHAR(255) NOT NULL,
        content TEXT,
        custom_fields JSON,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
    )";
    
    $db->query($sql);
}

// 获取分页数据
function getPaginatedData($page = 1, $perPage = ITEMS_PER_PAGE) {
    $offset = ($page - 1) * $perPage;
    $sql = "SELECT * FROM scraped_data ORDER BY created_at DESC LIMIT ?, ?";
    return DB::getInstance()->fetchAll($sql, [$offset, $perPage]);
}

// 获取总记录数
function getTotalRecords() {
    $sql = "SELECT COUNT(*) as total FROM scraped_data";
    $result = DB::getInstance()->fetchOne($sql);
    return $result['total'];
}

3. 数据采集器 (includes/scraper.php)

php 复制代码
<?php
require_once 'functions.php';

class Scraper {
    private $siteConfig;
    
    public function __construct($siteKey) {
        $this->siteConfig = TARGET_SITES[$siteKey] ?? null;
        if (!$this->siteConfig) {
            throw new Exception("Invalid site key");
        }
    }
    
    public function scrape() {
        $html = $this->fetchContent($this->siteConfig['url']);
        $data = $this->parseContent($html);
        $this->saveData($data);
        return $data;
    }
    
    private function fetchContent($url) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36');
        
        $html = curl_exec($ch);
        if (curl_errno($ch)) {
            throw new Exception('Curl error: ' . curl_error($ch));
        }
        curl_close($ch);
        
        return $html;
    }
    
    private function parseContent($html) {
        $dom = new DOMDocument();
        @$dom->loadHTML($html);
        $xpath = new DOMXPath($dom);
        
        $data = [];
        foreach ($this->siteConfig['selectors'] as $key => $selector) {
            $nodes = $xpath->query("//" . str_replace(' ', '//', $selector));
            $values = [];
            foreach ($nodes as $node) {
                $values[] = trim($node->nodeValue);
            }
            $data[$key] = $values;
        }
        
        // 将数据转换为记录格式
        $records = [];
        $maxItems = max(array_map('count', $data));
        for ($i = 0; $i < $maxItems; $i++) {
            $record = [];
            foreach ($data as $key => $values) {
                $record[$key] = $values[$i] ?? '';
            }
            $records[] = $record;
        }
        
        return $records;
    }
    
    private function saveData($records) {
        $db = DB::getInstance();
        $sourceSite = array_search($this->siteConfig, TARGET_SITES);
        
        foreach ($records as $record) {
            // 检查是否已存在相同标题的记录
            $existing = $db->fetchOne(
                "SELECT id FROM scraped_data WHERE title = ? AND source_site = ?",
                [$record['title'], $sourceSite]
            );
            
            if (!$existing) {
                $customFields = array_diff_key($record, array_flip(['title', 'content']));
                
                $db->query(
                    "INSERT INTO scraped_data (source_site, title, content, custom_fields) VALUES (?, ?, ?, ?)",
                    [
                        $sourceSite,
                        $record['title'],
                        $record['content'] ?? '',
                        json_encode($customFields, JSON_UNESCAPED_UNICODE)
                    ]
                );
            }
        }
    }
}

// 定时任务调用示例
function runScrapers() {
    foreach (array_keys(TARGET_SITES) as $siteKey) {
        try {
            $scraper = new Scraper($siteKey);
            $scraper->scrape();
            echo "Scraped data from $siteKey successfully.\n";
        } catch (Exception $e) {
            echo "Error scraping $siteKey: " . $e->getMessage() . "\n";
        }
    }
}

4. 前端页面 (index.php)

php 复制代码
<?php
require_once 'includes/functions.php';
initDatabase();

$page = isset($_GET['page']) ? max(1, intval($_GET['page'])) : 1;
$data = getPaginatedData($page);
$totalRecords = getTotalRecords();
$totalPages = ceil($totalRecords / ITEMS_PER_PAGE);
?>
<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>数据采集展示系统</title>
    <link rel="stylesheet" href="assets/css/style.css">
</head>
<body>
    <div class="container">
        <header>
            <h1>采集数据展示</h1>
            <div class="stats">
                共 <?php echo $totalRecords; ?> 条记录,第 <?php echo $page; ?> 页/共 <?php echo $totalPages; ?> 页
            </div>
        </header>
        
        <div class="data-list" id="dataContainer">
            <?php foreach ($data as $item): ?>
                <div class="data-item">
                    <h3><?php echo htmlspecialchars($item['title']); ?></h3>
                    <div class="meta">
                        <span class="source">来源: <?php echo htmlspecialchars($item['source_site']); ?></span>
                        <span class="date"><?php echo $item['created_at']; ?></span>
                    </div>
                    <div class="content">
                        <?php echo nl2br(htmlspecialchars($item['content'])); ?>
                    </div>
                    <?php 
                    $customFields = json_decode($item['custom_fields'], true);
                    if ($customFields && is_array($customFields)): ?>
                        <div class="custom-fields">
                            <?php foreach ($customFields as $key => $value): ?>
                                <div class="field">
                                    <strong><?php echo htmlspecialchars($key); ?>:</strong>
                                    <span><?php echo htmlspecialchars($value); ?></span>
                                </div>
                            <?php endforeach; ?>
                        </div>
                    <?php endif; ?>
                </div>
            <?php endforeach; ?>
        </div>
        
        <div class="pagination">
            <?php if ($page > 1): ?>
                <a href="?page=<?php echo $page - 1; ?>" class="prev">上一页</a>
            <?php endif; ?>
            
            <?php 
            $startPage = max(1, $page - 2);
            $endPage = min($totalPages, $page + 2);
            
            for ($i = $startPage; $i <= $endPage; $i++): ?>
                <a href="?page=<?php echo $i; ?>" class="<?php echo $i == $page ? 'active' : ''; ?>">
                    <?php echo $i; ?>
                </a>
            <?php endfor; ?>
            
            <?php if ($page < $totalPages): ?>
                <a href="?page=<?php echo $page + 1; ?>" class="next">下一页</a>
            <?php endif; ?>
        </div>
    </div>
    
    <script src="assets/js/script.js"></script>
</body>
</html>

5. CSS 样式 (assets/css/style.css)

css 复制代码
/* 基础样式 */
body {
    font-family: 'Arial', sans-serif;
    line-height: 1.6;
    color: #333;
    background-color: #f5f5f5;
    margin: 0;
    padding: 0;
}

.container {
    max-width: 1200px;
    margin: 0 auto;
    padding: 20px;
}

header {
    background-color: #2c3e50;
    color: white;
    padding: 20px;
    border-radius: 5px;
    margin-bottom: 30px;
}

header h1 {
    margin: 0;
    font-size: 28px;
}

.stats {
    font-size: 14px;
    opacity: 0.8;
    margin-top: 10px;
}

/* 数据列表样式 */
.data-list {
    display: grid;
    grid-gap: 20px;
}

.data-item {
    background-color: white;
    border-radius: 5px;
    padding: 20px;
    box-shadow: 0 2px 5px rgba(0,0,0,0.1);
    transition: transform 0.3s ease, box-shadow 0.3s ease;
}

.data-item:hover {
    transform: translateY(-3px);
    box-shadow: 0 5px 15px rgba(0,0,0,0.1);
}

.data-item h3 {
    margin-top: 0;
    color: #2c3e50;
    border-bottom: 1px solid #eee;
    padding-bottom: 10px;
}

.meta {
    font-size: 14px;
    color: #7f8c8d;
    margin-bottom: 15px;
}

.meta .source {
    margin-right: 15px;
}

.content {
    margin-bottom: 15px;
}

.custom-fields {
    background-color: #f9f9f9;
    padding: 10px;
    border-radius: 3px;
    font-size: 14px;
}

.field {
    margin-bottom: 5px;
}

.field strong {
    display: inline-block;
    min-width: 80px;
    color: #7f8c8d;
}

/* 分页样式 */
.pagination {
    display: flex;
    justify-content: center;
    margin-top: 30px;
    flex-wrap: wrap;
}

.pagination a {
    display: inline-block;
    padding: 8px 16px;
    margin: 0 5px;
    background-color: white;
    border: 1px solid #ddd;
    border-radius: 3px;
    text-decoration: none;
    color: #2c3e50;
    transition: all 0.3s ease;
}

.pagination a:hover {
    background-color: #f1f1f1;
}

.pagination a.active {
    background-color: #2c3e50;
    color: white;
    border-color: #2c3e50;
}

/* 响应式设计 */
@media (max-width: 768px) {
    .container {
        padding: 10px;
    }
    
    header {
        padding: 15px;
    }
    
    .data-item {
        padding: 15px;
    }
    
    .pagination a {
        padding: 6px 12px;
        margin: 3px;
    }
}

6. JavaScript (assets/js/script.js)

javascript 复制代码
document.addEventListener('DOMContentLoaded', function() {
    // 可以添加交互功能,比如AJAX加载、动画等
    
    // 示例:平滑滚动到顶部
    const scrollToTop = document.createElement('div');
    scrollToTop.className = 'scroll-to-top';
    scrollToTop.innerHTML = '↑';
    scrollToTop.style.display = 'none';
    scrollToTop.style.position = 'fixed';
    scrollToTop.style.bottom = '20px';
    scrollToTop.style.right = '20px';
    scrollToTop.style.backgroundColor = '#2c3e50';
    scrollToTop.style.color = 'white';
    scrollToTop.style.width = '40px';
    scrollToTop.style.height = '40px';
    scrollToTop.style.borderRadius = '50%';
    scrollToTop.style.textAlign = 'center';
    scrollToTop.style.lineHeight = '40px';
    scrollToTop.style.cursor = 'pointer';
    scrollToTop.style.boxShadow = '0 2px 5px rgba(0,0,0,0.2)';
    scrollToTop.style.zIndex = '1000';
    document.body.appendChild(scrollToTop);
    
    scrollToTop.addEventListener('click', function() {
        window.scrollTo({
            top: 0,
            behavior: 'smooth'
        });
    });
    
    window.addEventListener('scroll', function() {
        if (window.pageYOffset > 300) {
            scrollToTop.style.display = 'block';
        } else {
            scrollToTop.style.display = 'none';
        }
    });
    
    // 可以添加更多交互功能...
    
    console.log('系统已加载完毕');
});

7. 伪静态配置 (.htaccess)

ini 复制代码
<IfModule mod_rewrite.c>
    RewriteEngine On
    
    # 确保请求的不是真实文件或目录
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    
    # 将URL重写到index.php
    RewriteRule ^page/([0-9]+)/?$ index.php?page=$1 [L,QSA]
    RewriteRule ^source/([a-zA-Z0-9_-]+)/?$ index.php?source=$1 [L,QSA]
    RewriteRule ^source/([a-zA-Z0-9_-]+)/page/([0-9]+)/?$ index.php?source=$1&page=$2 [L,QSA]
    
    # 其他重写规则...
</IfModule>

使用说明

  1. 初始化数据库:

    • 创建数据库并导入SQL结构
    • 修改config.php中的数据库配置
  2. 配置采集站点:

    • 在config.php的TARGET_SITES数组中添加要采集的站点和选择器
  3. 定时采集:

    • 可以设置cron job定时执行采集脚本
    • 示例cron命令: php /path/to/project/includes/scraper.php
  4. 伪静态URL:

    • 启用后可以使用类似 /page/2 的URL代替 ?page=2
    • 确保服务器已启用mod_rewrite
  5. 前端访问:

    • 访问index.php查看采集的数据

扩展建议

  1. 添加用户认证以保护采集功能
  2. 实现更复杂的数据清洗和处理
  3. 添加缓存机制提高性能
  4. 实现多线程采集提高效率
  5. 添加日志记录采集过程

这个系统实现了前后端分离,支持伪静态,并提供了完整的数据采集和展示功能。你可以根据需要进一步扩展和完善它。 更多文章详情:baijiahao.baidu.com/s?id=183050...

相关推荐
山河木马11 分钟前
前端学习C++之:.h(.hpp)与.cpp文件
前端·javascript·c++
努力只为躺平16 分钟前
一文搞懂 Promise 并发控制:批量执行 vs 最大并发数,实用场景全解析!
前端·javascript
Web小助手18 分钟前
大保剑:Promise的有趣体验
javascript
李大玄18 分钟前
Google浏览器拓展工具 "GU"->google Utils
前端·javascript·github
爱编程的喵18 分钟前
从DOM0到事件委托:揭秘JavaScript事件机制的性能密码
前端·javascript·dom
sunbyte31 分钟前
50天50个小项目 (Vue3 + Tailwindcss V4) ✨ | ContentPlaceholder(背景占位)
前端·javascript·css·vue.js·tailwindcss
谦哥1 小时前
Claude4免费Vibe Coding!目前比较好的Cursor替代方案
前端·javascript·claude
心在飞扬1 小时前
理解JS事件环(Event Loop)
前端·javascript
敲代码的玉米C1 小时前
深入理解链表反转:从基础到进阶的完整指南
javascript
山有木兮木有枝_2 小时前
JavaScript 设计模式--单例模式
前端·javascript·代码规范