使用Arrow管理数据

在之前的数据挖掘:是时候更新一下TCGA的数据了推文中,保存TCGA的数据就是使用Arrow格式,因为占空间小,读写速度快,多语言支持(我主要使用的3种语言都支持)

Format

https://arrow.apache.org

Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.

Language Supported

Arrow's libraries implement the format and provide building blocks for a range of use cases, including high performance analytics. Many popular projects use Arrow to ship columnar data efficiently or as the basis for analytic engines.

Libraries are available for C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.

Ecosystem

Apache Arrow is software created by and for the developer community. We are dedicated to open, kind communication and consensus decisionmaking. Our committers come from a range of organizations and backgrounds, and we welcome all to participate with us.

R

install.packages("arrow")

library(arrow)

write iris to iris.arrow and compressed by zstd

arrow::write_ipc_file(iris,'iris.arrow', compression = "zstd",compression_level=1)

read iris.arrow as DataFrame

iris=arrow::read_ipc_file('iris.arrow')

python

conda install -y pandas pyarrow

import pandas as pd

read iris.arrow as DataFrame

iris=pd.read_feather('iris.arrow')

write iris to iris.arrow and compressed by zstd

iris.to_feather('iris.arrow',compression='zstd', compression_level=1)

Julia

using Pkg

Pkg.add(["Arrow","DataFrames"])

using Arrow, DataFrames

read iris.arrow as DataFrame

iris = Arrow.Table("iris.arrow") |> DataFrame

write iris to iris.arrow, using 8 threads and compressed by zstd

Arrow.write("iris.arrow",iris,compress=:zstd,ntasks=8)

相关推荐
老华带你飞3 分钟前
记录生活系统|记录美好|健康管理|基于java+Android+微信小程序的记录生活系统设计与实现(源码+数据库+文档)
android·java·数据库·vue.js·生活·毕设·记录生活系统
青春:一叶知秋13 分钟前
【Redis存储】Redis介绍
数据库·redis·缓存
_Minato_16 分钟前
数据库知识整理——SQL数据更新
数据库·sql
韩立学长40 分钟前
基于Springboot的汽车推荐系统设计与实现7f7h74np(程序、源码、数据库、调试部署方案及开发环境)系统界面展示及获取方式置于文档末尾,可供参考。
数据库·spring boot·汽车
一 乐1 小时前
海产品销售系统|海鲜商城购物|基于SprinBoot+vue的海鲜商城系统(源码+数据库+文档)
java·前端·javascript·数据库·vue.js·后端
有趣的野鸭3 小时前
JAVA课程十一次实验课程主要知识点示例
java·前端·数据库
兰若姐姐4 小时前
cisp-pte之SQL注入题之vulnerabilities/fu1.php?id=1
数据库·sql
数据皮皮侠7 小时前
区县政府税务数据分析能力建设DID(2007-2025)
大数据·数据库·人工智能·信息可视化·微信开放平台
请叫我阿杰8 小时前
Ubuntu系统安装.NET SDK 7.0
数据库·ubuntu·.net
q***82919 小时前
如何使用C#与SQL Server数据库进行交互
数据库·c#·交互