使用Arrow管理数据

在之前的数据挖掘:是时候更新一下TCGA的数据了推文中,保存TCGA的数据就是使用Arrow格式,因为占空间小,读写速度快,多语言支持(我主要使用的3种语言都支持)

Format

https://arrow.apache.org

Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.

Language Supported

Arrow's libraries implement the format and provide building blocks for a range of use cases, including high performance analytics. Many popular projects use Arrow to ship columnar data efficiently or as the basis for analytic engines.

Libraries are available for C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.

Ecosystem

Apache Arrow is software created by and for the developer community. We are dedicated to open, kind communication and consensus decisionmaking. Our committers come from a range of organizations and backgrounds, and we welcome all to participate with us.

R

install.packages("arrow")

library(arrow)

write iris to iris.arrow and compressed by zstd

arrow::write_ipc_file(iris,'iris.arrow', compression = "zstd",compression_level=1)

read iris.arrow as DataFrame

iris=arrow::read_ipc_file('iris.arrow')

python

conda install -y pandas pyarrow

import pandas as pd

read iris.arrow as DataFrame

iris=pd.read_feather('iris.arrow')

write iris to iris.arrow and compressed by zstd

iris.to_feather('iris.arrow',compression='zstd', compression_level=1)

Julia

using Pkg

Pkg.add(["Arrow","DataFrames"])

using Arrow, DataFrames

read iris.arrow as DataFrame

iris = Arrow.Table("iris.arrow") |> DataFrame

write iris to iris.arrow, using 8 threads and compressed by zstd

Arrow.write("iris.arrow",iris,compress=:zstd,ntasks=8)

相关推荐
qq_3482318510 分钟前
MySQL 与 PostgreSQL PL/pgSQL 的对比详解
数据库·mysql·postgresql
玩转数据库管理工具FOR DBLENS31 分钟前
DBLens:开启数据库管理新纪元——永久免费,智能高效的国产化开发利器
数据结构·数据库·测试工具·数据库开发
芝麻馅汤圆儿37 分钟前
sockperf 工具
linux·服务器·数据库
IndulgeCui37 分钟前
金仓数据库征文_使用KDTS迁移mysql至金仓数据库问题处理记录分享
数据库
wsx_iot1 小时前
mysql的快照读和当前读
数据库·mysql
梁萌1 小时前
MySQL分区表使用保姆级教程
数据库·mysql·优化·分区表·分区·partitions
期待のcode1 小时前
MyBatis-Plus的Wrapper核心体系
java·数据库·spring boot·后端·mybatis
透明的玻璃杯1 小时前
sqlite数据库链接池二
数据库·oracle·sqlite
老华带你飞1 小时前
出行旅游安排|基于springboot出行旅游安排系统(源码+数据库+文档)
java·数据库·vue.js·spring boot·后端·spring·旅游
Logic1012 小时前
《数据库运维》 郭文明 实验4 数据库备份与恢复实验核心操作与思路解析
运维·数据库·sql·mysql·学习笔记·形考作业·国家开放大学