使用Arrow管理数据

在之前的数据挖掘:是时候更新一下TCGA的数据了推文中,保存TCGA的数据就是使用Arrow格式,因为占空间小,读写速度快,多语言支持(我主要使用的3种语言都支持)

Format

https://arrow.apache.org

Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.

Language Supported

Arrow's libraries implement the format and provide building blocks for a range of use cases, including high performance analytics. Many popular projects use Arrow to ship columnar data efficiently or as the basis for analytic engines.

Libraries are available for C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.

Ecosystem

Apache Arrow is software created by and for the developer community. We are dedicated to open, kind communication and consensus decisionmaking. Our committers come from a range of organizations and backgrounds, and we welcome all to participate with us.

R

install.packages("arrow")

library(arrow)

write iris to iris.arrow and compressed by zstd

arrow::write_ipc_file(iris,'iris.arrow', compression = "zstd",compression_level=1)

read iris.arrow as DataFrame

iris=arrow::read_ipc_file('iris.arrow')

python

conda install -y pandas pyarrow

import pandas as pd

read iris.arrow as DataFrame

iris=pd.read_feather('iris.arrow')

write iris to iris.arrow and compressed by zstd

iris.to_feather('iris.arrow',compression='zstd', compression_level=1)

Julia

using Pkg

Pkg.add(["Arrow","DataFrames"])

using Arrow, DataFrames

read iris.arrow as DataFrame

iris = Arrow.Table("iris.arrow") |> DataFrame

write iris to iris.arrow, using 8 threads and compressed by zstd

Arrow.write("iris.arrow",iris,compress=:zstd,ntasks=8)

相关推荐
小光学长15 分钟前
基于vue框架的电信用户业务管理系统的设计与实现8ly70(程序+源码+数据库+调试部署+开发环境)带论文文档1万字以上,文末可获取,系统界面在最后面。
数据库
程序员不想YY啊30 分钟前
MySQL元数据库完全指南:探秘数据背后的数据
数据库·mysql·oracle
数据最前线33 分钟前
Doris表设计与分区策略:让海量数据管理更高效
数据库
时光追逐者43 分钟前
MongoDB从入门到实战之MongoDB快速入门(附带学习路线图)
数据库·学习·mongodb
头顶秃成一缕光1 小时前
Redis的主从模式和哨兵模式
数据库·redis·缓存
AIGC大时代1 小时前
高效使用DeepSeek对“情境+ 对象 +问题“型课题进行开题!
数据库·人工智能·算法·aigc·智能写作·deepseek
博睿谷IT99_1 小时前
数据库证书可以选OCP认证吗?
数据库·oracle·开闭原则·ocp认证
乐维_lwops1 小时前
数据库监控 | MongoDB监控全解析
数据库·mongodb·数据库监控
观无1 小时前
Redis安装及入门应用
数据库·redis·缓存
柏油2 小时前
MySql InnoDB 事务实现之 undo log 日志
数据库·后端·mysql