大数据量查询处理方案 - 内存优化与高效展示
📋 问题背景
在实际开发中,经常遇到需要查询大量数据的场景,如:
- 用户行为日志分析(百万级记录)
- 订单历史数据导出(千万级记录)
- 报表统计查询(海量数据聚合)
- 数据迁移和同步(TB级数据)
核心挑战:
- 内存溢出:数据量超过JVM堆内存限制
- 响应超时:查询时间过长导致接口超时
- 用户体验差:前端长时间等待或卡死
- 系统稳定性:影响其他业务正常运行
🎯 技术解决方案
1. 分页查询 - 基础方案
java
@RestController
@RequestMapping("/api/data")
public class DataController {
@Autowired
private DataService dataService;
/**
* 分页查询大数据
*/
@GetMapping("/list")
public Result<PageResult<DataVO>> getDataList(
@RequestParam(defaultValue = "1") Integer pageNum,
@RequestParam(defaultValue = "100") Integer pageSize,
@RequestParam(required = false) String keyword) {
// 限制每页最大数量,防止恶意请求
pageSize = Math.min(pageSize, 1000);
PageResult<DataVO> result = dataService.getDataByPage(pageNum, pageSize, keyword);
return Result.success(result);
}
}
@Service
public class DataService {
@Autowired
private DataMapper dataMapper;
public PageResult<DataVO> getDataByPage(Integer pageNum, Integer pageSize, String keyword) {
// 使用MyBatis-Plus分页插件
Page<DataEntity> page = new Page<>(pageNum, pageSize);
LambdaQueryWrapper<DataEntity> wrapper = new LambdaQueryWrapper<>();
if (StringUtils.hasText(keyword)) {
wrapper.like(DataEntity::getName, keyword)
.or()
.like(DataEntity::getDescription, keyword);
}
Page<DataEntity> resultPage = dataMapper.selectPage(page, wrapper);
// 转换为VO对象
List<DataVO> voList = resultPage.getRecords().stream()
.map(this::convertToVO)
.collect(Collectors.toList());
return new PageResult<>(voList, resultPage.getTotal(), pageNum, pageSize);
}
}
2. 流式查询 - 内存优化方案
java
@Service
public class StreamDataService {
@Autowired
private SqlSessionFactory sqlSessionFactory;
/**
* 流式查询大数据,避免内存溢出
*/
public void exportLargeData(HttpServletResponse response, String keyword) {
response.setContentType("application/vnd.ms-excel");
response.setHeader("Content-Disposition", "attachment; filename=data.xlsx");
try (SqlSession sqlSession = sqlSessionFactory.openSession();
OutputStream outputStream = response.getOutputStream();
ExcelWriter excelWriter = EasyExcel.write(outputStream, DataVO.class).build()) {
DataMapper mapper = sqlSession.getMapper(DataMapper.class);
WriteSheet writeSheet = EasyExcel.writerSheet("数据").build();
// 使用MyBatis的游标查询,流式处理
try (Cursor<DataEntity> cursor = mapper.selectByCursor(keyword)) {
List<DataVO> batch = new ArrayList<>(1000);
for (DataEntity entity : cursor) {
batch.add(convertToVO(entity));
// 批量写入Excel,控制内存使用
if (batch.size() >= 1000) {
excelWriter.write(batch, writeSheet);
batch.clear();
}
}
// 写入剩余数据
if (!batch.isEmpty()) {
excelWriter.write(batch, writeSheet);
}
}
} catch (Exception e) {
log.error("导出数据失败", e);
throw new BusinessException("导出失败");
}
}
}
// Mapper接口
@Mapper
public interface DataMapper extends BaseMapper<DataEntity> {
/**
* 游标查询,流式处理大数据
*/
@Select("SELECT * FROM data_table WHERE name LIKE CONCAT('%', #{keyword}, '%')")
@Options(resultSetType = ResultSetType.FORWARD_ONLY, fetchSize = 1000)
Cursor<DataEntity> selectByCursor(@Param("keyword") String keyword);
}
3. 异步处理 - 用户体验优化
java
@Service
public class AsyncDataService {
@Autowired
private RedisTemplate<String, Object> redisTemplate;
@Autowired
private TaskExecutor taskExecutor;
/**
* 异步查询大数据
*/
@Async("taskExecutor")
public CompletableFuture<String> queryLargeDataAsync(String taskId, QueryParam param) {
String statusKey = "task:status:" + taskId;
String resultKey = "task:result:" + taskId;
try {
// 更新任务状态
redisTemplate.opsForValue().set(statusKey, "PROCESSING", 30, TimeUnit.MINUTES);
List<DataVO> result = new ArrayList<>();
int pageSize = 1000;
int pageNum = 1;
while (true) {
List<DataEntity> batch = dataMapper.selectByPage(param, pageNum, pageSize);
if (batch.isEmpty()) {
break;
}
// 转换并添加到结果集
List<DataVO> voList = batch.stream()
.map(this::convertToVO)
.collect(Collectors.toList());
result.addAll(voList);
// 更新进度
updateProgress(taskId, pageNum * pageSize);
pageNum++;
// 防止内存溢出,分批处理
if (result.size() > 10000) {
// 可以考虑分批存储到文件或缓存
storePartialResult(taskId, result);
result.clear();
}
}
// 存储最终结果
redisTemplate.opsForValue().set(resultKey, result, 1, TimeUnit.HOURS);
redisTemplate.opsForValue().set(statusKey, "COMPLETED", 1, TimeUnit.HOURS);
return CompletableFuture.completedFuture(taskId);
} catch (Exception e) {
redisTemplate.opsForValue().set(statusKey, "FAILED", 1, TimeUnit.HOURS);
log.error("异步查询失败: taskId={}", taskId, e);
throw new RuntimeException(e);
}
}
/**
* 查询任务状态
*/
public TaskStatus getTaskStatus(String taskId) {
String statusKey = "task:status:" + taskId;
String progressKey = "task:progress:" + taskId;
String status = (String) redisTemplate.opsForValue().get(statusKey);
Integer progress = (Integer) redisTemplate.opsForValue().get(progressKey);
return new TaskStatus(taskId, status, progress);
}
}
@RestController
@RequestMapping("/api/async")
public class AsyncDataController {
@Autowired
private AsyncDataService asyncDataService;
/**
* 提交异步查询任务
*/
@PostMapping("/query")
public Result<String> submitQuery(@RequestBody QueryParam param) {
String taskId = UUID.randomUUID().toString();
asyncDataService.queryLargeDataAsync(taskId, param);
return Result.success(taskId);
}
/**
* 查询任务状态
*/
@GetMapping("/status/{taskId}")
public Result<TaskStatus> getStatus(@PathVariable String taskId) {
TaskStatus status = asyncDataService.getTaskStatus(taskId);
return Result.success(status);
}
}
4. 数据库优化方案
java
@Configuration
public class DatabaseConfig {
/**
* 配置数据源,优化大数据查询
*/
@Bean
@Primary
public DataSource primaryDataSource() {
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:mysql://localhost:3306/db");
config.setUsername("user");
config.setPassword("password");
// 优化大数据查询的连接池配置
config.setMaximumPoolSize(20);
config.setMinimumIdle(5);
config.setConnectionTimeout(60000);
config.setIdleTimeout(300000);
config.setMaxLifetime(900000);
// MySQL特定优化
config.addDataSourceProperty("useServerPrepStmts", "true");
config.addDataSourceProperty("prepStmtCacheSize", "250");
config.addDataSourceProperty("prepStmtCacheSqlLimit", "2048");
config.addDataSourceProperty("useCursorFetch", "true");
config.addDataSourceProperty("defaultFetchSize", "1000");
return new HikariDataSource(config);
}
}
// 优化的查询SQL
@Repository
public class OptimizedDataMapper {
/**
* 使用索引优化的分页查询
*/
@Select("""
SELECT * FROM data_table
WHERE id > #{lastId}
AND name LIKE CONCAT('%', #{keyword}, '%')
ORDER BY id ASC
LIMIT #{pageSize}
""")
List<DataEntity> selectByIdCursor(@Param("lastId") Long lastId,
@Param("keyword") String keyword,
@Param("pageSize") Integer pageSize);
/**
* 统计查询,避免COUNT(*)
*/
@Select("""
SELECT COUNT(1) FROM data_table
WHERE name LIKE CONCAT('%', #{keyword}, '%')
""")
Long countByKeyword(@Param("keyword") String keyword);
}
5. 前端优化方案
javascript
// 虚拟滚动组件
<template>
<div class="virtual-scroll-container" ref="container" @scroll="handleScroll">
<div class="virtual-scroll-content" :style="{ height: totalHeight + 'px' }">
<div
class="virtual-scroll-list"
:style="{ transform: `translateY(${offsetY}px)` }"
>
<div
v-for="item in visibleItems"
:key="item.id"
class="virtual-scroll-item"
:style="{ height: itemHeight + 'px' }"
>
<slot :item="item"></slot>
</div>
</div>
</div>
<!-- 加载更多 -->
<div v-if="loading" class="loading">加载中...</div>
</div>
</template>
<script>
export default {
name: 'VirtualScroll',
props: {
items: Array,
itemHeight: { type: Number, default: 50 },
bufferSize: { type: Number, default: 5 }
},
data() {
return {
containerHeight: 0,
scrollTop: 0,
loading: false
}
},
computed: {
totalHeight() {
return this.items.length * this.itemHeight
},
visibleCount() {
return Math.ceil(this.containerHeight / this.itemHeight)
},
startIndex() {
return Math.max(0, Math.floor(this.scrollTop / this.itemHeight) - this.bufferSize)
},
endIndex() {
return Math.min(this.items.length, this.startIndex + this.visibleCount + this.bufferSize * 2)
},
visibleItems() {
return this.items.slice(this.startIndex, this.endIndex)
},
offsetY() {
return this.startIndex * this.itemHeight
}
},
methods: {
handleScroll(e) {
this.scrollTop = e.target.scrollTop
// 滚动到底部时加载更多
if (this.scrollTop + this.containerHeight >= this.totalHeight - 100) {
this.loadMore()
}
},
async loadMore() {
if (this.loading) return
this.loading = true
try {
await this.$emit('load-more')
} finally {
this.loading = false
}
}
},
mounted() {
this.containerHeight = this.$refs.container.clientHeight
}
}
</script>
6. 分片下载方案
java
@RestController
@RequestMapping("/api/download")
public class DownloadController {
/**
* 分片下载大文件
*/
@GetMapping("/large-data")
public ResponseEntity<StreamingResponseBody> downloadLargeData(
@RequestParam String taskId,
@RequestParam(defaultValue = "0") Long offset,
@RequestParam(defaultValue = "1048576") Long chunkSize,
HttpServletRequest request) {
StreamingResponseBody stream = outputStream -> {
try (FileInputStream fis = new FileInputStream(getDataFile(taskId))) {
fis.skip(offset);
byte[] buffer = new byte[8192];
long remaining = chunkSize;
int bytesRead;
while (remaining > 0 && (bytesRead = fis.read(buffer, 0,
(int) Math.min(buffer.length, remaining))) != -1) {
outputStream.write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
}
};
HttpHeaders headers = new HttpHeaders();
headers.add("Content-Type", "application/octet-stream");
headers.add("Accept-Ranges", "bytes");
headers.add("Content-Range", String.format("bytes %d-%d/*", offset, offset + chunkSize - 1));
return ResponseEntity.status(HttpStatus.PARTIAL_CONTENT)
.headers(headers)
.body(stream);
}
}
🚀 性能优化策略
1. 缓存策略
java
@Service
public class CachedDataService {
@Cacheable(value = "dataCache", key = "#param.hashCode()",
condition = "#param.pageSize <= 100")
public PageResult<DataVO> getCachedData(QueryParam param) {
return dataService.getDataByPage(param);
}
// 使用Redis缓存大数据集的摘要信息
@Cacheable(value = "summaryCache", key = "'summary:' + #keyword")
public DataSummary getDataSummary(String keyword) {
return dataService.calculateSummary(keyword);
}
}
2. 索引优化
sql
-- 创建复合索引优化查询
CREATE INDEX idx_data_name_time ON data_table(name, create_time);
-- 创建覆盖索引减少回表
CREATE INDEX idx_data_covering ON data_table(name, status, id, create_time);
-- 分区表优化大数据查询
CREATE TABLE data_table (
id BIGINT PRIMARY KEY,
name VARCHAR(255),
create_time DATETIME
) PARTITION BY RANGE (YEAR(create_time)) (
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026)
);
📊 监控与告警
java
@Component
public class DataQueryMonitor {
private final MeterRegistry meterRegistry;
@EventListener
public void handleLargeQuery(LargeQueryEvent event) {
// 记录大数据查询指标
Timer.Sample sample = Timer.start(meterRegistry);
sample.stop(Timer.builder("large.query.duration")
.tag("type", event.getQueryType())
.register(meterRegistry));
// 记录数据量
Gauge.builder("large.query.size")
.tag("type", event.getQueryType())
.register(meterRegistry, event, LargeQueryEvent::getDataSize);
// 内存使用告警
if (event.getMemoryUsage() > 0.8) {
log.warn("大数据查询内存使用过高: {}%", event.getMemoryUsage() * 100);
}
}
}
💡 最佳实践建议
1. 查询优化
- **避免SELECT ***:只查询需要的字段
- 使用LIMIT:限制返回数据量
- 合理使用索引:避免全表扫描
- 分批处理:大数据分批查询和处理
2. 内存管理
- 流式处理:使用Stream API和游标查询
- 及时释放:处理完数据后及时清理
- 监控内存:实时监控JVM内存使用情况
- 设置限制:对查询结果数量设置上限
3. 用户体验
- 异步处理:长时间查询使用异步方式
- 进度提示:显示查询进度和预估时间
- 分页展示:前端使用虚拟滚动或分页
- 缓存结果:缓存常用查询结果
🎯 总结
处理大数据量查询需要从多个维度考虑:
- 后端优化:分页查询、流式处理、异步执行
- 数据库优化:索引优化、分区表、连接池配置
- 前端优化:虚拟滚动、懒加载、分片下载
- 系统监控:内存监控、性能指标、告警机制
关键是要根据具体业务场景选择合适的技术方案,在性能、用户体验和系统稳定性之间找到平衡点。