请编写一个网页,它调用系统安装的python 统计包,把csv文本文件中的数据显示在网格中,可以计算变量列的相关系数、回归方程,还能输入sql语句进行过滤、计数、汇总等。
前端页面index.html
html
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no">
<title>智能数据分析平台 - CSV数据透视分析 (Python统计引擎)</title>
<!-- 引入样式库: Bootstrap 5 清新风格 + Font Awesome 6 图标 -->
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0-alpha1/dist/css/bootstrap.min.css" rel="stylesheet">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css">
<!-- 引入DataTables网格增强 (用于前端展示效果好,但核心数据由后端供给) -->
<link rel="stylesheet" href="https://cdn.datatables.net/1.13.4/css/jquery.dataTables.min.css">
<script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
<script src="https://cdn.datatables.net/1.13.4/js/jquery.dataTables.min.js"></script>
<style>
body {
background: #f4f7fc;
font-family: 'Segoe UI', Roboto, 'Helvetica Neue', sans-serif;
}
.header {
background: linear-gradient(135deg, #1e2a3e, #0f172a);
color: white;
padding: 1.5rem 0;
margin-bottom: 2rem;
box-shadow: 0 4px 12px rgba(0,0,0,0.1);
}
.card-custom {
border: none;
border-radius: 1rem;
box-shadow: 0 8px 20px rgba(0,0,0,0.05);
transition: all 0.2s;
background: white;
}
.card-header-custom {
background-color: rgba(30, 42, 62, 0.05);
border-bottom: 1px solid #e9ecef;
font-weight: 600;
padding: 1rem 1.25rem;
border-radius: 1rem 1rem 0 0 !important;
}
.btn-primary-custom {
background-color: #2c3e66;
border-color: #1f2c47;
}
.btn-primary-custom:hover {
background-color: #1e2a46;
}
.dataTables_wrapper .dataTables_paginate .paginate_button.current {
background: #2c3e66 !important;
color: white !important;
border: none;
}
#dataGrid {
font-size: 0.9rem;
}
.stat-badge {
background: #eef2ff;
padding: 0.25rem 0.75rem;
border-radius: 20px;
font-family: monospace;
}
pre {
background: #f8f9fc;
border-left: 4px solid #2c3e66;
padding: 0.8rem;
border-radius: 0.5rem;
}
.footer {
margin-top: 2rem;
text-align: center;
padding: 1rem;
color: #5b6e8c;
}
.loading-spinner {
display: none;
position: fixed;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
z-index: 1060;
background: rgba(0,0,0,0.6);
padding: 1.5rem 2rem;
border-radius: 2rem;
color: white;
font-weight: bold;
}
.sql-editor {
font-family: 'Courier New', monospace;
font-size: 0.9rem;
}
.btn-sm-icon {
margin-left: 6px;
}
</style>
</head>
<body>
<div class="loading-spinner" id="loadingSpinner">
<i class="fas fa-spinner fa-pulse fa-2x me-2"></i> 正在调用Python统计引擎处理...
</div>
<div class="header">
<div class="container">
<h1><i class="fas fa-chart-line me-2"></i>Python 统计分析工作台</h1>
<p class="lead mb-0">上传 CSV / 展示网格 | 相关系数矩阵 | 线性回归方程 | SQL 式数据过滤与聚合</p>
</div>
</div>
<div class="container">
<!-- 文件上传区域 -->
<div class="card card-custom mb-4">
<div class="card-header-custom">
<i class="fas fa-upload me-2"></i> 1. 加载 CSV 数据文件
</div>
<div class="card-body">
<div class="row align-items-end">
<div class="col-md-8">
<label class="form-label fw-bold">选择逗号分隔(CSV)文件 <span class="text-muted">(支持UTF-8编码,首行为列名)</span></label>
<input type="file" class="form-control" id="csvFileInput" accept=".csv">
</div>
<div class="col-md-4 mt-3 mt-md-0">
<button class="btn btn-primary w-100" id="uploadBtn"><i class="fas fa-database me-1"></i> 加载并预览</button>
</div>
</div>
<div id="fileInfo" class="mt-3 small text-muted"></div>
</div>
</div>
<!-- 数据网格展示区 -->
<div class="card card-custom mb-4">
<div class="card-header-custom d-flex justify-content-between align-items-center flex-wrap">
<span><i class="fas fa-table me-2"></i> 数据网格 (前100行)</span>
<span class="badge bg-secondary" id="rowCountBadge">未加载</span>
</div>
<div class="card-body">
<div class="table-responsive">
<table id="dataGrid" class="table table-striped table-bordered w-100" style="font-size:0.85rem">
<thead>
<tr><th>加载数据后显示...</th></tr>
</thead>
<tbody></tbody>
</table>
</div>
</div>
</div>
<!-- 统计学分析区域: 相关系数 + 回归方程 -->
<div class="row g-4 mb-4">
<div class="col-md-6">
<div class="card card-custom h-100">
<div class="card-header-custom">
<i class="fas fa-chart-scatter me-2"></i> 相关系数矩阵 (Pearson)
</div>
<div class="card-body">
<div class="mb-3">
<label class="form-label">选择数值列 (至少两列)</label>
<select id="corrCols" class="form-select" multiple size="3"></select>
<div class="form-text">按住 Ctrl 键多选 (或 Cmd)</div>
</div>
<button class="btn btn-outline-primary" id="calcCorrBtn"><i class="fas fa-calculator"></i> 计算相关系数</button>
<div id="corrResult" class="mt-3 overflow-auto" style="max-height: 260px;">
<p class="text-muted">点击按钮后显示相关系数矩阵</p>
</div>
</div>
</div>
</div>
<div class="col-md-6">
<div class="card card-custom h-100">
<div class="card-header-custom">
<i class="fas fa-chart-line me-2"></i> 线性回归方程 (y = a + bx)
</div>
<div class="card-body">
<div class="row">
<div class="col-6">
<label class="form-label">因变量 Y</label>
<select id="regY" class="form-select"></select>
</div>
<div class="col-6">
<label class="form-label">自变量 X</label>
<select id="regX" class="form-select"></select>
</div>
</div>
<button class="btn btn-outline-primary mt-3 w-100" id="calcRegBtn"><i class="fas fa-chart-simple"></i> 计算回归方程</button>
<div id="regResult" class="mt-3">
<p class="text-muted">回归结果将显示方程、R²等统计量</p>
</div>
</div>
</div>
</div>
</div>
<!-- SQL 式查询引擎 -->
<div class="card card-custom mb-4">
<div class="card-header-custom">
<i class="fas fa-code me-2"></i> SQL 风格查询引擎 (支持过滤、分组计数、汇总)
</div>
<div class="card-body">
<div class="alert alert-info small">
<i class="fas fa-info-circle"></i> 语法示例:
<ul class="mb-0 mt-1">
<li><code>SELECT * FROM data WHERE age > 30</code> → 过滤行</li>
<li><code>SELECT department, COUNT(*) as cnt, AVG(salary) as avg_salary FROM data GROUP BY department</code> → 分组聚合</li>
<li><code>SELECT SUM(amount), MAX(score) FROM data</code> → 全表汇总</li>
</ul>
支持列名引用原始CSV列,支持 WHERE、GROUP BY、常用聚合函数 (COUNT, SUM, AVG, MIN, MAX)。
</div>
<div class="mb-3">
<label class="form-label fw-bold"><i class="fas fa-terminal"></i> SQL 查询语句</label>
<textarea id="sqlQuery" rows="3" class="form-control sql-editor" placeholder="例如: SELECT city, AVG(temperature) as avg_temp FROM data GROUP BY city ORDER BY avg_temp DESC"></textarea>
</div>
<div class="d-flex gap-2">
<button class="btn btn-success" id="executeSqlBtn"><i class="fas fa-play"></i> 执行查询</button>
<button class="btn btn-secondary" id="resetDataViewBtn"><i class="fas fa-undo-alt"></i> 重置原始视图</button>
</div>
<div id="sqlResultPanel" class="mt-4">
<div class="fw-bold mb-2">查询结果:</div>
<div class="table-responsive" style="max-height: 400px; overflow-y: auto;">
<table class="table table-sm table-bordered" id="sqlResultTable">
<thead><tr><th>---</th></tr></thead>
<tbody><tr><td class="text-muted">暂无查询结果</td></tr></tbody>
</table>
</div>
<div id="sqlMeta" class="small text-muted mt-1"></div>
</div>
</div>
</div>
<div class="footer">
<i class="fas fa-microchip"></i> 后端基于 Python (Flask + Pandas + SciPy) | 实时分析引擎
</div>
</div>
<script>
// 当前存储的全局数据 (DataFrame模拟前端存储,但所有统计计算/回归/SQL实际由后端Python执行)
// 我们维护当前上传的数据集标识: 后端会话将数据缓存在内存中,靠文件上传时post保存
let currentDataLoaded = false;
let currentColumns = [];
let currentNumericCols = [];
// 辅助加载显示 loading
function showLoading() {
$('#loadingSpinner').fadeIn(200);
}
function hideLoading() {
$('#loadingSpinner').fadeOut(200);
}
// 刷新数值列下拉选择器(相关系数,回归变量)
function updateColumnSelectors(cols, numericCols) {
currentColumns = cols;
currentNumericCols = numericCols;
const corrSelect = $('#corrCols');
const regY = $('#regY');
const regX = $('#regX');
corrSelect.empty();
regY.empty();
regX.empty();
// 填充下拉选项
cols.forEach(col => {
let opt = `<option value="${escapeHtml(col)}">${escapeHtml(col)}</option>`;
corrSelect.append(opt);
regY.append(opt);
regX.append(opt);
});
// 默认仅对数值列多选提示更好
if(numericCols.length>0){
corrSelect.val(numericCols.slice(0, Math.min(2, numericCols.length)));
}
// 默认选择前两个数值列用于回归
if(numericCols.length >=2){
regY.val(numericCols[0]);
regX.val(numericCols[1]);
} else if(numericCols.length ===1){
regY.val(numericCols[0]);
regX.val(numericCols[0]);
}
}
// 渲染数据表格 (DataTable)
let dataTableInstance = null;
function renderDataGrid(rows, columns) {
if(dataTableInstance){
dataTableInstance.destroy();
$('#dataGrid thead').empty();
$('#dataGrid tbody').empty();
}
// 构建表头
let thead = '<tr>';
columns.forEach(col => {
thead += `<th>${escapeHtml(col)}</th>`;
});
thead += '</tr>';
$('#dataGrid thead').html(thead);
let tbodyHtml = '';
rows.forEach(row => {
let tr = '<tr>';
columns.forEach(col => {
let val = row[col] !== undefined && row[col] !== null ? row[col] : '';
tr += `<td>${escapeHtml(String(val))}</td>`;
});
tr += '</tr>';
tbodyHtml += tr;
});
$('#dataGrid tbody').html(tbodyHtml);
dataTableInstance = $('#dataGrid').DataTable({
paging: true,
pageLength: 10,
searching: true,
ordering: true,
info: true,
scrollX: true,
responsive: true
});
$('#rowCountBadge').text(`共 ${rows.length} 行`);
}
// 辅助转义
function escapeHtml(str) {
if(str === undefined || str === null) return '';
return String(str).replace(/[&<>]/g, function(m) {
if(m === '&') return '&';
if(m === '<') return '<';
if(m === '>') return '>';
return m;
}).replace(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g, function(c) {
return c;
});
}
// 调用后端API公用函数
async function callApi(endpoint, data) {
showLoading();
try {
const response = await fetch(endpoint, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(data)
});
const result = await response.json();
hideLoading();
if (!response.ok || !result.success) {
throw new Error(result.error || '请求失败');
}
return result;
} catch (err) {
hideLoading();
alert(`API错误: ${err.message}`);
throw err;
}
}
// 上传文件
$('#uploadBtn').click(async () => {
const fileInput = document.getElementById('csvFileInput');
const file = fileInput.files[0];
if(!file){
alert('请先选择一个CSV文件');
return;
}
showLoading();
const formData = new FormData();
formData.append('file', file);
try {
const response = await fetch('/api/upload', {
method: 'POST',
body: formData
});
const result = await response.json();
hideLoading();
if(!result.success){
alert('上传解析失败: '+result.error);
return;
}
currentDataLoaded = true;
// 更新列信息
updateColumnSelectors(result.columns, result.numeric_columns);
// 渲染表格 (前100行)
renderDataGrid(result.preview_rows, result.columns);
$('#fileInfo').html(`<i class="fas fa-check-circle text-success"></i> 已加载: ${result.file_name} | 总行数: ${result.total_rows} | 列数: ${result.columns.length}`);
} catch(err){
hideLoading();
alert('上传失败:'+err.message);
}
});
// 相关系数计算
$('#calcCorrBtn').click(async () => {
if(!currentDataLoaded){
alert('请先上传CSV文件');
return;
}
const selectedCols = $('#corrCols').val();
if(!selectedCols || selectedCols.length < 2){
alert('请至少选择两个数值列用于计算相关系数矩阵');
return;
}
try {
const res = await callApi('/api/correlation', { columns: selectedCols });
const corrMatrix = res.correlation_matrix;
// 构建表格展示
let html = '<table class="table table-sm table-bordered text-center"><thead><tr><th>变量</th>';
const vars = Object.keys(corrMatrix);
vars.forEach(v => html += `<th>${escapeHtml(v)}</th>`);
html += '</tr></thead><tbody>';
vars.forEach(rowVar => {
html += `<tr><th>${escapeHtml(rowVar)}</th>`;
vars.forEach(colVar => {
let val = corrMatrix[rowVar][colVar];
let formatted = (val !== undefined) ? val.toFixed(4) : '-';
let highlight = (Math.abs(val) > 0.7 && val !== 1) ? 'bg-warning bg-opacity-25' : '';
html += `<td class="${highlight}">${formatted}</td>`;
});
html += '</tr>';
});
html += '</tbody></table>';
$('#corrResult').html(html);
} catch(e){
$('#corrResult').html('<div class="alert alert-danger">相关系数计算失败:'+e.message+'</div>');
}
});
// 线性回归
$('#calcRegBtn').click(async () => {
if(!currentDataLoaded){
alert('请先加载数据');
return;
}
const yVar = $('#regY').val();
const xVar = $('#regX').val();
if(!yVar || !xVar){
alert('请选择因变量和自变量');
return;
}
if(yVar === xVar){
alert('因变量与自变量不能相同');
return;
}
try {
const res = await callApi('/api/regression', { y_col: yVar, x_col: xVar });
const { slope, intercept, r_squared, p_value, std_err } = res;
const equation = `y = ${slope.toFixed(4)} * x + ${intercept.toFixed(4)}`;
let html = `<div class="alert alert-light border"><strong>📈 回归方程:</strong> ${equation}<br>`;
html += `<strong>R² (决定系数):</strong> ${r_squared.toFixed(6)}<br>`;
html += `<strong>斜率 (b):</strong> ${slope.toFixed(6)} | <strong>截距 (a):</strong> ${intercept.toFixed(6)}<br>`;
if(p_value) html += `<strong>p值 (显著性):</strong> ${p_value.toExponential(4)}<br>`;
if(std_err) html += `<strong>标准误:</strong> ${std_err.toFixed(6)}`;
html += `</div>`;
$('#regResult').html(html);
} catch(e){
$('#regResult').html(`<div class="alert alert-danger">回归失败: ${e.message}</div>`);
}
});
// SQL 查询核心
$('#executeSqlBtn').click(async () => {
if(!currentDataLoaded){
alert('请先上传CSV文件');
return;
}
const sql = $('#sqlQuery').val().trim();
if(!sql){
alert('请输入SQL查询语句');
return;
}
try {
const res = await callApi('/api/sql_query', { query: sql });
const { columns, rows, affected_rows, message } = res;
// 渲染结果表
let theadHtml = '<tr>';
columns.forEach(col => theadHtml += `<th>${escapeHtml(col)}</th>`);
theadHtml += '</tr>';
let tbodyHtml = '';
rows.forEach(row => {
let tr = '<tr>';
columns.forEach(col => {
let val = row[col] !== undefined && row[col] !== null ? row[col] : '';
tr += `<td>${escapeHtml(String(val))}</td>`;
});
tr += '</tr>';
tbodyHtml += tr;
});
$('#sqlResultTable thead').html(theadHtml);
$('#sqlResultTable tbody').html(tbodyHtml || '<tr><td colspan="100" class="text-muted">无数据返回</td></tr>');
$('#sqlMeta').html(`影响/返回行数: ${affected_rows !== undefined ? affected_rows : rows.length} 行`);
} catch(e){
$('#sqlResultTable thead').html('<tr><th>错误</th></tr>');
$('#sqlResultTable tbody').html(`<tr><td class="text-danger">查询错误: ${e.message}</td></tr>`);
$('#sqlMeta').html('');
}
});
// 重置原始数据视图: 重新从后端获取当前数据的前100行并刷新网格
$('#resetDataViewBtn').click(async () => {
if(!currentDataLoaded){
alert('没有已加载的数据集');
return;
}
try {
const res = await callApi('/api/get_current_preview', {});
renderDataGrid(res.preview_rows, res.columns);
// 同时更新列选择器(保持跟后端一致)
updateColumnSelectors(res.columns, res.numeric_columns);
$('#sqlResultTable thead').html('<tr><th>---</th></tr>');
$('#sqlResultTable tbody').html('<tr><td class="text-muted">重置后,SQL结果已清空</td></tr>');
} catch(e){
alert('重置失败:'+e.message);
}
});
</script>
后端依赖包(需 pip 安装)
bash
pip install flask pandas scipy numpy pandasql
各包说明:
- flask - Web 框架
- pandas - 数据处理
- scipy - 科学计算(提供
linregress线性回归) - numpy - 数值计算(pandas 依赖)
- pandasql - 在 DataFrame 上执行 SQL 查询
Python 内置库(无需安装):
traceback- 异常堆栈追踪json- JSON 处理io- 字符串 IOos- 操作系统接口
完整后端代码(修正版)
python
# app.py - 后端服务
# 运行前请安装: pip install flask pandas scipy numpy pandasql
import json
import pandas as pd
import numpy as np
from flask import Flask, request, jsonify, session
from scipy import stats
import traceback
from io import StringIO
app = Flask(__name__)
app.secret_key = 'data_analysis_secret_key_2024'
# 存储当前会话的数据
session_data = {}
def get_current_df():
df_json = session.get('dataframe_json')
if df_json is None:
return None
return pd.read_json(StringIO(df_json), orient='split')
def save_df(df):
session['dataframe_json'] = df.to_json(orient='split')
@app.route('/api/upload', methods=['POST'])
def upload_csv():
try:
if 'file' not in request.files:
return jsonify({'success': False, 'error': '未提供文件'})
file = request.files['file']
if file.filename == '':
return jsonify({'success': False, 'error': '空文件名'})
df = pd.read_csv(file)
if df.empty:
return jsonify({'success': False, 'error': 'CSV无数据'})
save_df(df)
preview = df.head(100).fillna('').to_dict(orient='records')
numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()
return jsonify({
'success': True,
'columns': df.columns.tolist(),
'numeric_columns': numeric_cols,
'total_rows': len(df),
'preview_rows': preview,
'file_name': file.filename
})
except Exception as e:
return jsonify({'success': False, 'error': str(e)})
@app.route('/api/get_current_preview', methods=['POST'])
def get_preview():
df = get_current_df()
if df is None:
return jsonify({'success': False, 'error': '无数据'})
preview = df.head(100).fillna('').to_dict(orient='records')
numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()
return jsonify({
'success': True,
'columns': df.columns.tolist(),
'numeric_columns': numeric_cols,
'preview_rows': preview
})
@app.route('/api/correlation', methods=['POST'])
def correlation():
df = get_current_df()
if df is None:
return jsonify({'success': False, 'error': '无数据'})
data = request.json
cols = data.get('columns', [])
if len(cols) < 2:
return jsonify({'success': False, 'error': '至少需要两列'})
sub_df = df[cols].select_dtypes(include=[np.number]).dropna()
if sub_df.shape[1] < 2:
return jsonify({'success': False, 'error': '选定列包含非数值或缺失过多'})
corr_mat = sub_df.corr().round(6)
result = {}
for c in corr_mat.columns:
result[c] = corr_mat[c].to_dict()
return jsonify({'success': True, 'correlation_matrix': result})
@app.route('/api/regression', methods=['POST'])
def regression():
df = get_current_df()
if df is None:
return jsonify({'success': False, 'error': '无数据'})
data = request.json
y_col = data.get('y_col')
x_col = data.get('x_col')
if not y_col or not x_col:
return jsonify({'success': False, 'error': '缺少变量'})
sub = df[[y_col, x_col]].dropna().copy()
sub = sub[pd.to_numeric(sub[y_col], errors='coerce').notna() & pd.to_numeric(sub[x_col], errors='coerce').notna()]
if len(sub) < 3:
return jsonify({'success': False, 'error': '有效数据点不足'})
x = sub[x_col].astype(float)
y = sub[y_col].astype(float)
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
r_squared = r_value ** 2
return jsonify({
'success': True,
'slope': slope,
'intercept': intercept,
'r_squared': r_squared,
'p_value': p_value,
'std_err': std_err
})
@app.route('/api/sql_query', methods=['POST'])
def sql_query():
df = get_current_df()
if df is None:
return jsonify({'success': False, 'error': '请先上传数据'})
data = request.json
query = data.get('query', '')
if not query:
return jsonify({'success': False, 'error': 'SQL为空'})
try:
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, locals())
# 将df变量注入
result = pysqldf(query.replace('FROM data', 'FROM df').replace('from data', 'from df'))
if result.empty:
return jsonify({'success': True, 'columns': [], 'rows': [], 'affected_rows': 0})
result_display = result.head(500).fillna('').to_dict(orient='records')
return jsonify({
'success': True,
'columns': result.columns.tolist(),
'rows': result_display,
'affected_rows': len(result)
})
except ImportError:
return jsonify({'success': False, 'error': '请安装 pandasql: pip install pandasql'})
except Exception as e:
traceback.print_exc()
return jsonify({'success': False, 'error': f'SQL执行错误: {str(e)}'})
if __name__ == '__main__':
print("启动数据分析服务...")
print("请确保前端HTML文件通过 http://127.0.0.1:5000 访问")
app.run(debug=True, port=5000)
启动步骤
- 安装依赖:
bash
pip install flask pandas scipy numpy pandasql
-
保存后端代码 为
app.py -
运行后端:
bash
python app.py
- 访问页面 :浏览器打开
http://127.0.0.1:5000(需要将前端HTML部署到同一端口,或使用Flask直接提供静态文件)
注意事项
traceback是 Python 内置库,直接使用即可pandasql用于支持 SQL 查询,如果不需要 SQL 功能可以移除该依赖- 确保 CSV 文件编码为 UTF-8,避免中文乱码
要让 Flask 后端同时提供 HTML 页面,需要将 HTML 文件放在正确的位置。以下是方法:
-
创建文件夹结构:
your_project/
├── app.py # 后端代码
├── templates/ # 存放HTML的文件夹
│ └── index.html # 前端页面 -
修改 app.py,在文件开头添加:
python
from flask import render_template # 添加这行
# 在 app = Flask(__name__) 后面添加路由
@app.route('/')
def index():
return render_template('index.html')
-
将 HTML 代码保存为
templates/index.html -
运行
python app.py,然后访问http://127.0.0.1:5000
执行结果:可以上传csv、预览、计算相关系数和回归方程,但执行sql报错,让他修改也没改好。