RepeatModeler 2.0.7 安装与使用--生信工具75-CSDN博客
运行环境与安装说明
bash
https://www.repeatmasker.org/RepeatMasker/ #官网
前置依赖
- 类 Unix 系统,且已安装 Perl 5.8.0 及以上版本
- Python 3 及
h5py库 安装教程参考:https://docs.h5py.org/en/latest/build.html
序列检索引擎
RepeatMasker 依靠序列检索引擎完成重复序列检索,目前支持 Cross_Match、RMBlast、WUBlast/ABBlast,需自行下载并安装其中任意一款:
- Cross_Match :隶属于 Phrap 软件包,前往 http://www.phrap.org 下载,选择 Phred/Phrap/Consed 套件即可。
- RMBlast :专为 RepeatMasker/RepeatModeler 改造的 NCBI Blast 版本,下载地址:http://www.repeatmasker.org/rmblast。建议使用 2.13.0 及以上版本。
- HMMER :请在此处下载 3.2.1 版本:http://hmmer.org/
- ABBlast/WUBlast :WU-BLAST(BLAST 2.0)相关权益由 Advanced Biocomputing 公司持有,授权及下载见:http://blast.advbiocomp.com/licensing/。RepeatMasker 3.2.8 及以上版本可完全兼容该两款工具。
串联重复序列查找工具(TRF)
全称为 Tandem Repeat Finder(作者 G. Benson 等)。 免费下载地址:http://tandem.bu.edu/trf/trf.html 或 https://github.com/Benson-Genomics-Lab/TRF。 本软件基于 TRF 4.0.9 版本开发调试。
重复序列数据库(FamDB)
RepeatMasker 支持自定义序列库,也可直接搭配 Dfam 数据库使用。Dfam 是收录转座子(TE)隐马尔可夫模型与一致序列的开源数据库。
当前版本 RepeatMasker 未附带转座子数据库,你可通过 FamDB 工具下载 FamDB H5 格式 的 Dfam 数据库,工具地址:https://github.com/Dfam-consortium/FamDB。 此外,也可下载 RepBase 专用版数据库,并与 Dfam 联合使用。RepBase RepeatMasker 数据库获取地址:http://www.girinst.org。
https://github.com/Dfam-consortium/FamDB/archive/refs/tags/3.0.0.tar.gz
安装步骤
01 下载安装包
- 最新版(2026-06-05):RepeatMasker-4.2.4.tar.gz
- 历史稳定版(2025-12-17):RepeatMasker-4.2.3.tar.gz
02 解压安装包
可将压缩包解压至个人家目录,或系统共享目录(如 /usr/local/)。请勿解压到已存在同名 RepeatMasker 文件夹下,避免文件被覆盖。 示例命令:
cp RepeatMasker-open-4-#-#.tar.gz /usr/local
cd /usr/local
gunzip RepeatMasker-open-4-#-#.tar.gz
tar xvf RepeatMasker-open-4-#-#.tar
03 安装重复序列库
当前软件包未内置数据库。你可直接使用自定义库(运行时添加参数 -lib mylib.fa);也建议安装 FamDB 工具,用于下载和管理 Dfam 转座子库(推荐方案)。
FamDB 为配套 Dfam 与 RepeatMasker 的辅助工具(非必需,但强烈推荐),安装及数据库下载流程如下:
# 安装依赖
pip3 install --user h5py
# 下载 FamDB 安装包(将 #.#.# 替换为实际版本号)
wget https://github.com/Dfam-consortium/FamDB/archive/refs/tags/#.#.#.tar.gz
tar zxvf #.#.#.tar.gz
cd FamDB-#.#.#
# 自动下载 Dfam 数据库
python3 utils/download_dfam.py
bash
https://github.com/Dfam-consortium/FamDB/releases
bash
https://www.dfam.org/releases/Dfam_3.9/families/FamDB/ #官网
04 执行配置脚本
软件首次使用前需完成初始化配置:
cd /usr/local/RepeatMasker
perl ./configure
bash
perl ./configure
-- Setting perl interpreter...
RepeatMasker Configuration Program
Checking for libraries...
- Found a FamDB root partition
<PRESS ENTER TO CONTINUE>
The full path including the name for the TRF program.
TRF_PRGM [/mnt/data/home/tycloud/anaconda3/envs/jiegou2/bin/trf]:
Add a Search Engine:
1. Crossmatch: [ Un-configured ]
2. RMBlast: [ Un-configured ]
3. HMMER3.1 & DFAM: [ Un-configured ]
4. ABBlast: [ Un-configured ]
5. Done
Enter Selection: 2
The path to the installation of the RMBLAST sequence alignment program.
RMBLAST_DIR [/mnt/data/home/tycloud/anaconda3/envs/jiegou2/bin]:
Add a Search Engine:
1. Crossmatch: [ Un-configured ]
2. RMBlast: [ Configured, Default ]
3. HMMER3.1 & DFAM: [ Un-configured ]
4. ABBlast: [ Un-configured ]
5. Done
Enter Selection: 3
The path to the HMMER profile HMM search software.
HMMER_DIR [/mnt/data/home/tycloud/anaconda3/envs/jiegou2/bin]:
Do you want HMMER3.1 & DFAM to be your default
search engine for Repeatmasker? (Y/N) [ Y ]: n
Add a Search Engine:
1. Crossmatch: [ Un-configured ]
2. RMBlast: [ Configured, Default ]
3. HMMER3.1 & DFAM: [ Configured ]
4. ABBlast: [ Un-configured ]
5. Done
Enter Selection: 5
Building FASTA version of RepeatMasker.lib .....
Building RMBlast frozen libraries..
The program is installed with a the following repeat libraries:
FamDB Directory : /mnt/data/home/tycloud/anaconda3/envs/jiegou2/share/RepeatMasker/Libraries/famdb
FamDB Format Version: 2.0.0
FamDB Creation Date : 2025-03-07 11:31:57.201792
Database: Dfam
Version : 3.9
Date : 2025-03-10
Dfam - A database of transposable element (TE) sequence alignments and HMMs.
3 Partitions Present
Total consensus sequences present: 320994
Total HMMs present : 320880
Partition Details
-----------------
Partition 0 [dfam3.9.0.h5]: root
Consensi: 237, HMMs: 199
Partition 1 [ Absent ]: Brachycera
Partition 2 [ Absent ]: Archelosauria
Partition 3 [ Absent ]: Hymenoptera
Partition 4 [ Absent ]: Otomorpha
Partition 5 [dfam3.9.5.h5]: rosids
Consensi: 166018, HMMs: 165980
Partition 6 [dfam3.9.6.h5]: Viridiplantae - Saxifragales, asterids, Proteales, Nymphaeales, Amborellales, Caryophyllales, Ranunculales, Mesostigmatophyceae, Chlorokybophyceae, Charophyceae, Lycopodiopsida, Chlorophyta, Liliopsida, Polypodiopsida, Marchantiophyta, Acrogymnospermae, Bryophyta
Consensi: 154739, HMMs: 154701
Partition 7 [ Absent ]: Mammalia
Partition 8 [ Absent ]: Noctuoidea
Partition 9 [ Absent ]: Obtectomera - Bombycoidea, Papilionoidea, Pyraloidea, Hesperioidea, Geometroidea, Drepanoidea, Pterophoroidea
Partition 10 [ Absent ]: Eupercaria
Partition 11 [ Absent ]: Ctenosquamata - Ovalentaria, Myctophata, Lampridacea, Carangaria, Holocentrimorphaceae, Batrachoidaria, Anabantaria, Paracanthopterygii, Ophidiaria, Gobiaria, Syngnathiaria, Pelagiaria
Partition 12 [ Absent ]: Vertebrata <vertebrates> - Chondrichthyes, Lepidosauria, Protacanthopterygii, Coelacanthimorpha, Amphibia, Cladistia, Holostei, Cyclostomata <vertebrates>, Osteoglossocephala, Stomiati, Dipnomorpha, Elopocephalai, Chondrostei
Partition 13 [ Absent ]: Coleoptera
Partition 14 [ Absent ]: Endopterygota - Gelechioidea, Yponomeutoidea, Incurvarioidea, Tineoidea, Apoditrysia, Nematocera, Strepsiptera, Neuropterida, Siphonaptera, Trichoptera
Partition 15 [ Absent ]: Protostomia - Nematoda, Chelicerata, Collembola, Polyneoptera, Monocondylia, Palaeoptera, Crustacea, Paraneoptera, Myriapoda, Scalidophora, Spiralia
Partition 16 [ Absent ]: Riboviria - Fungi, Cnidaria, Discoba, Sar, Amoebozoa, Metamonada, Filasterea, Polydnaviriformidae, Rotosphaerida, Cryptophyceae, Hemichordata, unclassified viruses, Choanoflagellata, Ichthyosporea, Rhodophyta, Tunicata, Cephalochordata, Ctenophora <comb jellies>, Placozoa, Apusozoa, Porifera, Haptista, Naldaviricetes, Bacteria <bacteria>, Echinodermata, Varidnaviria, Riboviria
Further documentation on the program may be found here:
/mnt/data/home/tycloud/anaconda3/envs/jiegou2/share/RepeatMasker/repeatmasker.help