在Flask和Celery环境下实现异步上传文件到HDFS(Hadoop Distributed File System)的功能,可以大大提高Web应用的性能和用户体验。以下是一个详细的步骤和代码示例,帮助你实现这一功能。
步骤 1: 环境准备
-
安装必要的库
pip install Flask Celery hdfs pip install redis # 如果使用Redis作为消息代 -
设置Redis(Celery需要消息代理,这里以Redis为例)
redis-server
步骤 2: 创建Flask应用和Celery任务
-
创建Flask应用
from flask import Flask, request, jsonify app = Flask(__name__ -
配置Celery
from celery import Celery def make_celery(app): celery = Celery( app.import_name, broker=app.config['CELERY_BROKER_URL'], backend=app.config['CELERY_RESULT_BACKEND'] ) celery.conf.update(app.config) TaskBase = celery.Task class ContextTask(TaskBase): abstract = True def __call__(self, *args, **kwargs): with app.app_context(): return TaskBase.__call__(self, *args, **kwargs) celery.Task = ContextTask return celery app.config['CELERY_BROKER_URL'] = 'redis://localhost:6379/0' app.config['CELERY_RESULT_BACKEND'] = 'redis://localhost:6379/0' celery = make_celery(app) -
定义上传任务
from hdfs import InsecureClient hdfs = InsecureClient('http://localhost:50070', user='your_username') @celery.task() def upload_to_hdfs(file_path, hdfs_path): with open(file_path, 'rb') as f: hdfs.write(hdfs_path, f, overwrite=True) return f'File uploaded to {hdfs_path}'
步骤 3: 创建上传接口
-
定义Flask路由
@app.route('/upload', methods=['POST']) def upload(): file = request.files['file'] if file: file_path = 'temp/' + file.filename # 存储在本地临时位置 file.save(file_path) hdfs_path = '/user/your_username/' + file.filename # HDFS路径 upload_to_hdfs.delay(file_path, hdfs_path) # 异步上传到HDFS return jsonify({'status': 'success', 'message': 'Upload in progress'}) return jsonify({'status': 'error', 'message': 'No file provided'}), 400
步骤 4: 运行应用和Celery worker
-
运行Flask应用
FLASK_APP=your_application.py flask run --port=5000其中
your_application.py是你的Flask应用文件。 -
启动Celery worker
celery -A your_application.celery worker --loglevel=inf同样,
your_application.py是你的Flask应用文件。
步骤 5: 测试上传功能
你可以使用Postman或者curl来测试上传接口:
curl -X POST -F "file=@path_to_your_file" http://localhost:5000/uploa
确保替换path_to_your_file为你的文件路径。
以上步骤和代码示例将帮助你在Flask和Celery环境下实现异步上传文件到HDFS的功能。