使用Arrow管理数据

在之前的数据挖掘:是时候更新一下TCGA的数据了推文中,保存TCGA的数据就是使用Arrow格式,因为占空间小,读写速度快,多语言支持(我主要使用的3种语言都支持)

Format

https://arrow.apache.org

Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.

Language Supported

Arrow's libraries implement the format and provide building blocks for a range of use cases, including high performance analytics. Many popular projects use Arrow to ship columnar data efficiently or as the basis for analytic engines.

Libraries are available for C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust.

Ecosystem

Apache Arrow is software created by and for the developer community. We are dedicated to open, kind communication and consensus decisionmaking. Our committers come from a range of organizations and backgrounds, and we welcome all to participate with us.

R

install.packages("arrow")

library(arrow)

write iris to iris.arrow and compressed by zstd

arrow::write_ipc_file(iris,'iris.arrow', compression = "zstd",compression_level=1)

read iris.arrow as DataFrame

iris=arrow::read_ipc_file('iris.arrow')

python

conda install -y pandas pyarrow

import pandas as pd

read iris.arrow as DataFrame

iris=pd.read_feather('iris.arrow')

write iris to iris.arrow and compressed by zstd

iris.to_feather('iris.arrow',compression='zstd', compression_level=1)

Julia

using Pkg

Pkg.add(["Arrow","DataFrames"])

using Arrow, DataFrames

read iris.arrow as DataFrame

iris = Arrow.Table("iris.arrow") |> DataFrame

write iris to iris.arrow, using 8 threads and compressed by zstd

Arrow.write("iris.arrow",iris,compress=:zstd,ntasks=8)

相关推荐
源力祁老师22 分钟前
Odoo日志系统核心组件_logger
网络·数据库·php
洋不写bug2 小时前
数据库基础核心操作——CRUD,超详细解析,搭配表格讲解和需求的实现。
数据库
马猴烧酒.2 小时前
JAVA后端用户登录与鉴权详解
java·数据库·sql
heartbeat..2 小时前
Redis 常用命令全解析:基础、进阶与场景化实战
java·数据库·redis·缓存
数据知道2 小时前
PostgreSQL 实战:一文掌握如何优雅的进行递归查询?
大数据·数据库·postgresql
陌上丨2 小时前
MySQL8.0高可用集群架构实战
数据库·mysql·架构
重生之绝世牛码2 小时前
Linux软件安装 —— ClickHouse单节点安装(rpm安装、tar安装两种安装方式)
大数据·linux·运维·数据库·clickhouse·软件安装·clickhouse单节点
一只自律的鸡3 小时前
【MySQL】第十一章 存储过程和存储函数
数据库·mysql
翔云1234563 小时前
MySQL 中的 utf8 vs utf8mb4 区别
数据库·mysql
数据知道3 小时前
PostgreSQL 实战:索引的设计原则详解
数据库·postgresql