R中一个字符串或者Python中一个list,需要分割成指定的n块。
顺序分割按照顺序从中间截断,而跨步分割将邻近的n个字符放到不同的块中。最高频用于文件整理,这里以向量分割为例。
1. 数据示例
1.1 Python
python
import string, numpy as np
c = (string.ascii_uppercase*2)[:30]
c = np.array(list(c))
print(c)
输出:
['A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J' 'K' 'L' 'M' 'N' 'O' 'P' 'Q' 'R' 'S' 'T' 'U' 'V' 'W' 'X' 'Y' 'Z' 'A' 'B' 'C' 'D']
1.2 R
r
c <- rep(LETTERS, 2)[1:30]
c
输出:
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z" "A" "B" "C" "D"
2. 顺序分割(contiguous blocks)
2.1 Python:np.array_split(可不整除)
python
np.array_split(c, 3)
返回结果:
[array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], dtype='<U1'),
array(['K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T'], dtype='<U1'),
array(['U', 'V', 'W', 'X', 'Y', 'Z', 'A', 'B', 'C', 'D'], dtype='<U1')]
2.2 Python:np.split(严格均分,长度必须能整除)
python
np.split(c[:30], 3)
返回结果:
[array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], dtype='<U1'),
array(['K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T'], dtype='<U1'),
array(['U', 'V', 'W', 'X', 'Y', 'Z', 'A', 'B', 'C', 'D'], dtype='<U1')]
注意:
np.split(c, 4)会报错,因为30 % 4 != 0。
2.3 R:顺序分割
r
split(c, rep(seq_len(3), each = ceiling(length(c)/3)))
返回结果:
$`1`
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
$`2`
[1] "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T"
$`3`
[1] "U" "V" "W" "X" "Y" "Z" "A" "B" "C" "D"
3. 跨步分割(stride / cyclic)
3.1 Python:按索引轮流分组
python
def stride(c, k):
return {i: c[i::k] for i in range(k)}
stride(c, 3)
返回结果:
{0: array(['A', 'D', 'G', 'J', 'M', 'P', 'S', 'V', 'Y', 'B'], dtype='<U1'),
1: array(['B', 'E', 'H', 'K', 'N', 'Q', 'T', 'W', 'Z', 'C'], dtype='<U1'),
2: array(['C', 'F', 'I', 'L', 'O', 'R', 'U', 'X', 'A', 'D'], dtype='<U1')}
[i::k]表示:
- i:起始索引(从0开始)
- 空:结束位置(省略表示到末尾)
- k:步长(每隔k个元素取1个)
示例:
pythonc = np.arange(1, 11) # [1,2,3,4,5,6,7,8,9,10] c[0::3] # 从索引0开始,每3步取1个 → [1,4,7,10] c[1::3] # 从索引1开始,每3步取1个 → [2,5,8] c[2::3] # 从索引2开始,每3步取1个 → [3,6,9]
3.2 R:跨步分割
r
split(c, 1:3)
返回结果:
$`1`
[1] "A" "D" "G" "J" "M" "P" "S" "V" "Y" "B"
$`2`
[1] "B" "E" "H" "K" "N" "Q" "T" "W" "Z" "C"
$`3`
[1] "C" "F" "I" "L" "O" "R" "U" "X" "A" "D"
4. 总结
| 目标 | Python | R | 备注 |
|---|---|---|---|
| 顺序/连续块 | np.array_split(c, k) |
split(c, rep(seq_len(k), each=ceiling(length(c)/k))) |
长度不整除自动处理 |
| 顺序/严格均分 | np.split(c, k) |
--- | 必须整除,否则报错 |
| 跨步/轮流分组 | {i: c[i::k] for i in range(k)} |
split(c, 1:k) |
按索引轮流分组 |