Sqoop 指令语法手册

help指令
list-databases
- 参数描述
- 示例
codegen
- 参数描述
[Sqoop create-hive-table](#Sqoop create-hive-table)
- 参数描述
eval
- 参数描述
Export
- 参数描述
import
- 参数描述
import-all-tables
- 参数描述
import-mainframe
- 参数描述
job
- 参数描述
list-tables
- 参数描述
merge
- 参数描述

help指令

bash 复制代码

sqoop help +下面的Sqoop命令(可选)

Sqoop命令	描述
`codegen`	生成与数据库记录交互的代码
`create-hive-table`	将表定义导入到Hive中
`eval`	评估SQL语句并显示结果
`export`	将HDFS目录导出到数据库表中
`help`	列出可用命令
`import`	从数据库导入表到HDFS
`import-all-tables`	从数据库导入所有表到HDFS
`import-mainframe`	从主机服务器导入数据集到HDFS
`job`	与保存的作业一起工作
`list-databases`	列出服务器上的可用数据库
`list-tables`	列出数据库中的可用表
`merge`	合并增量导入的结果
`metastore`	运行独立的Sqoop元存储
`version`	显示版本信息

list-databases

参数描述

参数	描述
`--connect <jdbc-uri>`	指定JDBC连接字符串
`--connection-manager <class-name>`	指定连接管理器类名
`--connection-param-file <properties-file>`	指定连接参数文件
`--driver <class-name>`	手动指定要使用的JDBC驱动类
`--hadoop-home <hdir>`	覆盖$HADOOP_MAPRED_HOME_ARG
`--hadoop-mapred-home <dir>`	覆盖$HADOOP_MAPRED_HOME_ARG
`--help`	打印使用说明
`--metadata-transaction-isolation-level <isolationlevel>`	定义元数据查询的事务隔离级别
`--oracle-escaping-disabled <boolean>`	禁用Oracle/OraOop连接管理器的转义机制
`-P`	从控制台读取密码
`--password <password>`	设置认证密码
`--password-alias <password-alias>`	凭据提供者密码别名
`--password-file <password-file>`	设置认证密码文件路径
`--relaxed-isolation`	使用读取未提交隔离级别进行导入
`--skip-dist-cache`	跳过将jars复制到分布式缓存
`--temporary-rootdir <rootdir>`	定义导入的临时根目录
`--throw-on-error`	在作业发生错误时重新抛出RuntimeException
`--username <username>`	设置认证用户名
`--verbose`	在工作时打印更多信息
`-conf <configuration file>`	指定应用程序配置文件（通用Hadoop命令行参数）
`-D <property=value>`	为给定属性定义一个值（通用Hadoop命令行参数）
`-fs <file:///hdfs://namenode:port>`	指定要使用的默认文件系统URL（通用Hadoop命令行参数）
`-jt <localresourcemanager:port>`	指定ResourceManager（通用Hadoop命令行参数）
`-files <file1,...>`	指定要复制到MapReduce集群的逗号分隔文件列表（通用参数）
`-libjars <jar1,...>`	指定要包含在类路径中的逗号分隔jar文件列表（通用参数）
`-archives <archive1,...>`	指定要在计算机上解压缩的逗号分隔存档列表（通用参数）

其中有些参数是通用的Hadoop命令行参数，它们必须位于任何工具特定参数之前。

示例

bash 复制代码

sqoop list-databases --connect jdbc:mysql://hadoop100:3306/ --username root --password password

codegen

参数描述

参数分类	参数名称	描述
Common arguments	`--connect <jdbc-uri>`	指定JDBC连接字符串
	`--connection-manager <class-name>`	指定连接管理器类名
	`--connection-param-file <properties-file>`	指定连接参数文件
	`--driver <class-name>`	手动指定要使用的JDBC驱动类
	`--hadoop-home <hdir>`	覆盖$HADOOP_MAPRED_HOME_ARG环境变量，指定Hadoop安装目录
	`--hadoop-mapred-home <dir>`	覆盖$HADOOP_MAPRED_HOME_ARG环境变量，指定Hadoop MapReduce安装目录
	`--help`	打印使用说明
	`--metadata-transaction-isolation-level <isolationlevel>`	定义元数据查询的事务隔离级别
	`--oracle-escaping-disabled <boolean>`	禁用Oracle/OraOop连接管理器的转义机制
	`-P`	从控制台读取密码
	`--password <password>`	设置认证密码
	`--password-alias <password-alias>`	凭据提供者密码别名
	`--password-file <password-file>`	设置认证密码文件路径
	`--relaxed-isolation`	为导入使用读未提交隔离
	`--skip-dist-cache`	跳过将jar复制到分布式缓存
	`--temporary-rootdir <rootdir>`	定义导入的临时根目录
	`--throw-on-error`	在作业发生错误时重新抛出RuntimeException
	`--username <username>`	设置认证用户名
	`--verbose`	工作时打印更多信息
Code generation arguments	`--bindir <dir>`	编译对象的输出目录
	`--class-name <name>`	设置生成的类名，覆盖--package-name。与--jar-file结合使用时，设置输入类
	`-e,--query <statement>`	生成代码的SQL语句
	`--escape-mapping-column-names <boolean>`	禁用列名中的特殊字符转义
	`--input-null-non-string <null-str>`	输入空值（非字符串）的表示
	`--input-null-string <null-str>`	输入空字符串的表示
	`--map-column-java <arg>`	覆盖特定列到Java类型的映射
	`--null-non-string <null-str>`	空值（非字符串）的表示
	`--null-string <null-str>`	空字符串的表示
	`--outdir <dir>`	生成代码的输出目录
	`--package-name <name>`	将自动生成的类放入此包中
	`--table <table-name>`	要生成代码的表
Output line formatting arguments	`--enclosed-by <char>`	设置必需的字段包围字符
	`--escaped-by <char>`	设置转义字符
	`--fields-terminated-by <char>`	设置字段分隔符字符
	`--lines-terminated-by <char>`	设置行终止字符
	`--mysql-delimiters`	使用MySQL的默认分隔符集
	`--optionally-enclosed-by <char>`	设置字段包围字符（可选）
Input parsing arguments	`--input-enclosed-by <char>`	设置输入字段的包围字符
	`--input-escaped-by <char>`	设置输入转义字符
	`--input-fields-terminated-by <char>`	设置输入字段分隔符
	`--input-lines-terminated-by <char>`	设置输入行终止符
	`--input-optionally-enclosed-by <char>`	设置输入字段的包围字符（可选）
Hive arguments	`--create-hive-table`	如果目标Hive表存在，则失败
	`--external-table-dir <hdfs path>`	设置外部表在HDFS中的位置
	`--hive-database <database-name>`	设置导入到Hive时使用的数据库名
	`--hive-delims-replacement <arg>`	替换从导入的字符串字段中的Hive记录\0x01和行分隔符（\n\r）为用户定义的字符串
	`--hive-drop-import-delims`	从导入的字符串字段中删除Hive记录\0x01和行分隔符（\n\r）
	`--hive-home <dir>`	覆盖$HIVE_HOME环境变量
	`--hive-import`	将表导入到Hive中（如果未设置，则使用Hive的默认分隔符）
	`--hive-overwrite`	覆盖Hive表中的现有数据
	`--hive-partition-key <partition-key>`	设置导入到Hive时使用的分区键
	`--hive-partition-value <partition-value>`	设置导入到Hive时使用的分区值
	`--hive-table <table-name>`	设置导入到Hive时使用的表名
	`--map-column-hive <arg>`	覆盖特定列到Hive类型的映射
HCatalog arguments	`--hcatalog-database <arg>`	HCatalog数据库名
	`--hcatalog-home <hdir>`	覆盖$HCAT_HOME环境变量，指定HCatalog安装目录
	`--hcatalog-partition-keys <partition-key>`	设置导入到Hive时使用的分区键
	`--hcatalog-partition-values <partition-value>`	设置导入到Hive时使用的分区值
	`--hcatalog-table <arg>`	HCatalog表名
Generic Hadoop command-line arguments	`-conf <configuration file>`	指定应用程序配置文件
	`-D <property=value>`	为给定属性定义一个值
	`-fs <file:///	hdfs://namenode:port>`
	`-jt <local	resourcemanager:port>`
	`-files <file1,...>`	指定要复制到MapReduce集群的逗号分隔的文件列表
	`-libjars <jar1,...>`	指定要包含在类路径中的逗号分隔的jar文件列表
	`-archives <archive1,...>`	指定要在计算机器上解压缩的逗号分隔的存档列表

Sqoop create-hive-table

参数描述

参数类别	参数名称	参数描述
Common arguments	--connect	Specify JDBC connect string
	--connection-manager	Specify connection manager class name
	--connection-param-file	Specify connection parameters file
	--driver	Manually specify JDBC driver class to use
	--hadoop-home	Override $HADOOP_MAPRED_HOME_ARG
	--hadoop-mapred-home	Override $HADOOP_MAPRED_HOME_ARG
	--help	Print usage instructions
	--metadata-transaction-isolation-level	Defines the transaction isolation level for metadata queries
	--oracle-escaping-disabled	Disable the escaping mechanism of the Oracle/OraOop connection managers
	-P	Read password from console
	--password	Set authentication password
	--password-alias	Credential provider password alias
	--password-file	Set authentication password file path
	--relaxed-isolation	Use read-uncommitted isolation for imports
	--skip-dist-cache	Skip copying jars to distributed cache
	--temporary-rootdir	Defines the temporary root directory for the import
	--throw-on-error	Rethrow a RuntimeException on error occurred during the job
	--username	Set authentication username
	--verbose	Print more information while working
Hive arguments	--create-hive-table	Fail if the target hive table exists
	--external-table-dir	Sets where the external table is in HDFS
	--hive-database	Sets the database name to use when importing to hive
	--hive-delims-replacement	Replace Hive record \0x01 and row delimiters (\n\r) from imported string fields with user-defined string
	--hive-drop-import-delims	Drop Hive record \0x01 and row delimiters (\n\r) from imported string fields
	--hive-home	Override $HIVE_HOME
	--hive-overwrite	Overwrite existing data in the Hive table
	--hive-partition-key	Sets the partition key to use when importing to hive
	--hive-partition-value	Sets the partition value to use when importing to hive
	--hive-table	Sets the table name to use when importing to hive
	--map-column-hive	Override mapping for specific column to hive types
	--table	The db table to read the definition from
Output line formatting arguments	--enclosed-by	Sets a required field enclosing character
	--escaped-by	Sets the escape character
	--fields-terminated-by	Sets the field separator character
	--lines-terminated-by	Sets the end-of-line character
	--mysql-delimiters	Uses MySQL's default delimiter set
	--optionally-enclosed-by	Sets a field enclosing character
Generic Hadoop command-line arguments	-conf	specify an application configuration file
	-D <property=value>	define a value for a given property
	-fs <file:///	hdfs://namenode:port>
	-jt <local	resourcemanager:port>
	-files <file1,...>	specify a comma-separated list of files to be copied to the map reduce cluster
	-libjars <jar1,...>	specify a comma-separated list of jar files to be included in the classpath
	-archives <archive1,...>	specify a comma-separated list of archives to be unarchived on the compute machines

eval

参数描述

类别	参数	描述
通用参数	--connect	指定JDBC连接字符串
	--connection-manager	指定连接管理器类名
	--connection-param-file	指定连接参数文件
	--driver	手动指定JDBC驱动类
	--hadoop-home	覆盖$HADOOP_MAPRED_HOME_ARG
	--hadoop-mapred-home	覆盖$HADOOP_MAPRED_HOME_ARG
	--help	打印使用说明
	--metadata-transaction-isolation-level	定义元数据查询的事务隔离级别
	--oracle-escaping-disabled	禁用Oracle/OraOop连接管理器的转义机制
	-P	从控制台读取密码
	--password	设置认证密码
	--password-alias	凭据提供者密码别名
	--password-file	设置认证密码文件路径
	--relaxed-isolation	为导入使用读未提交隔离
	--skip-dist-cache	跳过将jar复制到分布式缓存
	--temporary-rootdir	定义导入的临时根目录
	--throw-on-error	在作业发生错误时重新抛出RuntimeException
	--username	设置认证用户名
	--verbose	工作时打印更多信息
SQL评估参数	-e,--query	在SQL中执行'statement'并退出
Hadoop通用命令行参数	-conf	指定应用程序配置文件
	-D <property=value>	为给定属性定义值
	-fs <file:///	hdfs://namenode:port>
	-jt <local	resourcemanager:port>
	-files <file1,...>	指定要复制到MapReduce集群的逗号分隔的文件列表
	-libjars <jar1,...>	指定要包含在类路径中的逗号分隔的jar文件列表
	-archives <archive1,...>	指定要在计算机上解压缩的逗号分隔的存档列表

Export

参数描述

参数分类	参数名称	描述
Common arguments	--connect	指定JDBC连接字符串
	--connection-manager	指定连接管理器类名
	--connection-param-file	指定连接参数文件
	--driver	手动指定要使用的JDBC驱动类
	--hadoop-home	覆盖$HADOOP_MAPRED_HOME_ARG
	--hadoop-mapred-home	覆盖$HADOOP_MAPRED_HOME_ARG
	--help	打印使用说明
	--metadata-transaction-isolation-level	定义元数据查询的事务隔离级别
	--oracle-escaping-disabled	禁用Oracle/OraOop连接管理器的转义机制
	-P	从控制台读取密码
	--password	设置认证密码
	--password-alias	凭据提供者密码别名
	--password-file	设置认证密码文件路径
	--relaxed-isolation	使用读未提交隔离级别进行导入
	--skip-dist-cache	跳过将jars复制到分布式缓存
	--temporary-rootdir	定义导入的临时根目录
	--throw-on-error	在作业发生错误时重新抛出RuntimeException
	--username	设置认证用户名
	--verbose	工作时打印更多信息
Export control arguments	--batch	指示以批处理模式执行底层语句
	--call	使用此存储过程填充表（每行一个调用）
	--clear-staging-table	指示可以删除暂存表中的任何数据
	--columns <col,col,col...>	要导出到表的列
	--direct	使用直接导出快速路径
	--export-dir	HDFS源路径，用于导出
	-m,--num-mappers	使用'n'个map任务并行导出
	--mapreduce-job-name	为生成的mapreduce作业设置名称
	--staging-table	中间暂存表
	--table	要填充的表
	--update-key	按指定键列更新记录
	--update-mode	指定当数据库中发现具有不匹配键的新行时如何执行更新
	--validate	使用配置的验证器验证复制
	--validation-failurehandler	验证失败处理程序的完全限定类名
	--validation-threshold	验证阈值的完全限定类名
	--validator	验证器的完全限定类名
Input parsing arguments	--input-enclosed-by	设置必需的字段包围字符
	--input-escaped-by	设置输入转义字符
	--input-fields-terminated-by	设置输入字段分隔符
	--input-lines-terminated-by	设置输入行结束字符
	--input-optionally-enclosed-by	设置字段包围字符
Output line formatting arguments	--enclosed-by	设置必需的字段包围字符
	--escaped-by	设置转义字符
	--fields-terminated-by	设置字段分隔符字符
	--lines-terminated-by	设置行结束字符
	--mysql-delimiters	使用MySQL的默认分隔符集
	--optionally-enclosed-by	设置字段包围字符
Code generation arguments	--bindir	编译对象的输出目录
	--class-name	设置生成的类名，这会覆盖--package-name
	--escape-mapping-column-names	禁用列名中特殊字符的转义
	--input-null-non-string	输入空值非字符串表示
	--input-null-string	输入空字符串表示
	--jar-file	禁用代码生成；使用指定的jar
	--map-column-java	为特定列到Java类型的映射设置覆盖
	--null-non-string	空值非字符串表示
	--null-string	空字符串表示
	--outdir	生成代码的输出目录
	--package-name	将自动生成的类放入此包中
HCatalog arguments	--hcatalog-database	HCatalog数据库名称
	--hcatalog-home	覆盖$HCAT_HOME
	--hcatalog-partition-keys	设置导入到Hive时要使用的分区键
	--hcatalog-partition-values	设置导入到Hive时要使用的分区值
	--hcatalog-table	HCatalog表名
	--hive-home	覆盖$HIVE_HOME
	--hive-partition-key	设置导入到Hive时要使用的分区键
	--hive-partition-value	设置导入到Hive时要使用的分区值
	--map-column-hive	为特定列到Hive类型的映射设置覆盖
Generic Hadoop command-line arguments	-conf	指定应用程序配置文件
	-D <property=value>	为给定属性定义值
	-fs <file:///hdfs://namenode:port>	指定要使用的默认文件系统URL
	-jt localresourcemanager:port	指定ResourceManager
	-files <file1,...>	指定要复制到mapreduce集群的文件列表
	-libjars <jar1,...>	指定要包含在类路径中的jar文件列表
	-archives <archive1,...>	指定要在计算机上解归档的归档文件列表

import

参数描述

参数分类	参数名称	描述
Common arguments	--connect	指定JDBC连接字符串
	--connection-manager	指定连接管理器类名
	--connection-param-file	指定连接参数文件
	--driver	手动指定JDBC驱动类
	--hadoop-home	覆盖$HADOOP_MAPRED_HOME_ARG
	--hadoop-mapred-home	覆盖$HADOOP_MAPRED_HOME_ARG
	--help	打印使用说明
	--metadata-transaction-isolation-level	定义元数据查询的事务隔离级别
	--oracle-escaping-disabled	禁用Oracle/OraOop连接管理器的转义机制
	-P	从控制台读取密码
	--password	设置认证密码
	--password-alias	凭据提供者密码别名
	--password-file	设置认证密码文件路径
	--relaxed-isolation	使用读未提交隔离级别进行导入
	--skip-dist-cache	跳过将jar复制到分布式缓存
	--temporary-rootdir	定义导入的临时根目录
	--throw-on-error	在作业发生错误时重新抛出RuntimeException
	--username	设置认证用户名
	--verbose	工作时打印更多信息
Import control arguments	--append	以追加模式导入数据
	--as-avrodatafile	导入数据到Avro数据文件
	--as-parquetfile	导入数据到Parquet文件
	--as-sequencefile	导入数据到SequenceFile
	--as-textfile	以纯文本形式导入数据（默认）
	--autoreset-to-one-mapper	如果没有分割键，则将映射器数量重置为一个
	--boundary-query	设置用于检索主键最大值和最小值的边界查询
	--columns <col,col,col...>	从表中导入的列
	--compression-codec	导入时使用的压缩编解码器
	--delete-target-dir	以删除模式导入数据
	--direct	使用直接导入快速路径
	--direct-split-size	在直接导入模式下，每'n'字节拆分输入流
	-e,--query	导入SQL语句的结果
	--fetch-size	当需要更多行时，从数据库中获取'n'行
	--inline-lob-limit	设置内联LOB的最大大小
	-m,--num-mappers	使用'n'个映射任务并行导入
	--mapreduce-job-name	为生成的mapreduce作业设置名称
	--merge-key	用于连接结果的键列
	--split-by	用于拆分工作单元的表列
	--split-limit	对于日期/时间/时间戳和整数类型的拆分列，每个拆分的行的上限。对于日期或时间戳字段，它以秒为单位计算。split-limit应大于0
	--table	要读取的表
	--target-dir	HDFS普通表目标目录
	--validate	使用配置的验证器验证副本
	--validation-failurehandler	验证失败处理程序的完全限定类名
	--validation-threshold	验证阈值的完全限定类名
	--validator	验证器的完全限定类名
	--warehouse-dir	表目标目录的HDFS父目录
	--where	导入期间使用的WHERE子句
	-z,--compress	启用压缩
Incremental import arguments	--check-column	要检查增量更改的源列
	--incremental	定义类型为'append'或'lastmodified'的增量导入
	--last-value	增量检查列中最后导入的值
Output line formatting arguments	--enclosed-by	设置必需的字段包围字符
	--escaped-by	设置转义字符
	--fields-terminated-by	设置字段分隔符字符
	--lines-terminated-by	设置行结束字符
	--mysql-delimiters	使用MySQL的默认分隔符集
	--optionally-enclosed-by	设置字段包围字符
Input parsing arguments	--input-enclosed-by	设置必需的字段封闭器
	--input-escaped-by	设置输入转义字符
	--input-fields-terminated-by	设置输入字段分隔符
	--input-lines-terminated-by	设置输入行结束符
	--input-optionally-enclosed-by	设置字段封闭字符
Hive arguments	--create-hive-table	如果目标hive表存在，则失败
	--external-table-dir	设置外部表在HDFS中的位置
	--hive-database	导入到hive时使用的数据库名称
	--hive-delims-replacement	用用户定义的字符串替换从导入的字符串字段中的Hive记录\0x01和行分隔符（\n\r）
	--hive-drop-import-delims	从导入的字符串字段中删除Hive记录\0x01和行分隔符（\n\r）
	--hive-home	覆盖$HIVE_HOME
	--hive-import	导入表到Hive
	--hive-overwrite	覆盖Hive表中现有的数据
	--hive-partition-key	导入到hive时使用的分区键
	--hive-partition-value	导入到hive时使用的分区值
	--hive-table	导入到hive时使用的表名
	--map-column-hive	覆盖特定列到hive类型的映射
HBase arguments	--column-family	设置导入的目标列族
	--hbase-bulkload	启用HBase批量加载
	--hbase-create-table	如果指定，则创建缺失的HBase表
	--hbase-row-key
指定哪个输入列用作行键
	--hbase-table	导入到HBase中的
HCatalog arguments	--hcatalog-database	HCatalog数据库名称
	--hcatalog-home	覆盖$HCAT_HOME
	--hcatalog-partition-keys	导入到hive时使用的分区键
	--hcatalog-partition-values	导入到hive时使用的分区值
	--hcatalog-table	HCatalog表名
	--map-column-hive	覆盖特定列到hive类型的映射
HCatalog import specific options	--create-hcatalog-table	导入前创建HCatalog
	--drop-and-create-hcatalog-table	导入前删除并创建HCatalog
	--hcatalog-storage-stanza	HCatalog表创建的存储语句
Accumulo arguments	--accumulo-batch-size	批处理大小（字节）
	--accumulo-column-family	设置导入的目标列族
	--accumulo-create-table	如果指定，则创建缺失的Accumulo表
	--accumulo-instance	Accumulo实例名称
	--accumulo-max-latency	最大写入延迟（毫秒）
	--accumulo-password	Accumulo密码

import-all-tables

参数描述

Sqoop import-all-tables 命令参数	描述
--connect	指定JDBC连接字符串
--connection-manager	指定连接管理器类名
--connection-param-file	指定连接参数文件
--driver	手动指定要使用的JDBC驱动类
--hadoop-home	覆盖$HADOOP_MAPRED_HOME_ARG
--hadoop-mapred-home	覆盖$HADOOP_MAPRED_HOME_ARG
--help	打印使用说明
--metadata-transaction-isolation-level	定义元数据查询的事务隔离级别
--oracle-escaping-disabled	禁用Oracle/OraOop连接管理器的转义机制
-P	从控制台读取密码
--password	设置认证密码
--password-alias	凭据提供者密码别名
--password-file	设置认证密码文件路径
--relaxed-isolation	使用读未提交隔离级别进行导入
--skip-dist-cache	跳过将jar复制到分布式缓存
--temporary-rootdir	定义导入的临时根目录
--throw-on-error	在作业发生错误时重新抛出RuntimeException
--username	设置认证用户名
--verbose	工作时打印更多信息
--as-avrodatafile	将数据导入为Avro数据文件
--as-parquetfile	将数据导入为Parquet文件
--as-sequencefile	将数据导入为SequenceFiles
--as-textfile	将数据导入为纯文本（默认）
--autoreset-to-one-mapper	如果没有可用的拆分键，则将映射器数量重置为一个
--compression-codec	导入时使用的压缩编解码器
--direct	使用直接导入快速路径
--direct-split-size	在直接模式下导入时，每'n'字节拆分输入流
--exclude-tables	导入所有表时排除的表
--fetch-size	当需要更多行时，从数据库中获取'n'行数
--inline-lob-limit	设置内联LOB的最大大小
-m,--num-mappers	使用'n'个映射任务并行导入
--mapreduce-job-name	为生成的mapreduce作业设置名称
--warehouse-dir	表目标位置的HDFS父目录
-z,--compress	启用压缩
--enclosed-by	设置必需的字段包围字符
--escaped-by	设置转义字符
--fields-terminated-by	设置字段分隔符字符
--lines-terminated-by	设置行尾字符
--mysql-delimiters	使用MySQL的默认分隔符集
--optionally-enclosed-by	设置字段包围字符
--input-enclosed-by	设置必需的字段封闭器
--input-escaped-by	设置输入转义字符
--input-fields-terminated-by	设置输入字段分隔符
--input-lines-terminated-by	设置输入行尾字符
--input-optionally-enclosed-by	设置字段封闭字符
--create-hive-table	如果目标hive表存在，则失败
--external-table-dir	设置外部表在HDFS中的位置
--hive-database	导入到hive时使用的数据库名称
--hive-delims-replacement	用用户定义的字符串替换导入的字符串字段中的Hive记录\0x01和行分隔符（\n\r）
--hive-drop-import-delims	从导入的字符串字段中删除Hive记录\0x01和行分隔符（\n\r）
--hive-home	覆盖$HIVE_HOME
--hive-import	将表导入到Hive中
--hive-overwrite	覆盖Hive表中现有的数据
--hive-partition-key	导入到hive时使用的分区键
--hive-partition-value	导入到hive时使用的分区值
--hive-table	导入到hive时使用的表名
--map-column-hive	覆盖特定列到hive类型的映射
--column-family	设置导入的目标列族
--hbase-bulkload	启用HBase批量加载
--hbase-create-table	如果指定，则创建缺失的HBase表
--hbase-row-key
指定哪个输入列用作行键
--hbase-table	导入到HBase中的
--hcatalog-database	HCatalog数据库名称
--hcatalog-home	覆盖$HCAT_HOME
--hcatalog-partition-keys	导入到hive时使用的分区键
--hcatalog-partition-values	导入到hive时使用的分区值
--hcatalog-table	HCatalog表名
--create-hcatalog-table	导入前创建HCatalog
--drop-and-create-hcatalog-table	导入前删除并创建HCatalog
--hcatalog-storage-stanza	HCatalog表创建的存储语句
--accumulo-batch-size	批次大小（以字节为单位）
--accumulo-column-family	设置导入的目标列族
--accumulo-create-table	如果指定，则创建缺失的Accumulo表
--accumulo-instance	Accumulo实例名称
--accumulo-max-latency	最大写入延迟（以毫秒为单位）
--accumulo-password	Accumulo密码
--accumulo-row-key
指定哪个输入列用作行键
--accumulo-table	导入到Accumulo中的
--accumulo-user	Accumulo用户名
--accumulo-visibility	应用于所有导入行的可见性标记
--accumulo-zookeepers	逗号分隔的zookeeper列表（主机:端口）
--bindir	编译对象的输出目录
--escape-mapping-column-names	禁用列名中的特殊字符转义
--input-null-non-string	输入空值非字符串表示
--input-null-string	输入空字符串表示
--jar-file	禁用代码生成；使用指定的jar
--map-column-java	覆盖特定列到java类型的映射
--null-non-string	空值非字符串表示
--null-string	空字符串表示
--outdir	生成代码的输出目录
--package-name	将自动生成的类放入此包中
-conf	指定应用程序配置文件
-D <property=value>	为给定属性定义值
-fs <file:///\|hdfs://namenode:port>	指定要使用的默认文件系统URL，覆盖配置中的'fs.defaultFS'属性
-jt <local\|resourcemanager:port>	指定ResourceManager
-files <file1,...>	指定要复制到mapreduce集群的逗号分隔的文件列表
-libjars <jar1,...>	指定要包含在类路径中的逗号分隔的jar文件列表
-archives <archive1,...>	指定要在计算机上解压缩的逗号分隔的存档列表

import-mainframe

参数描述

参数分类	参数名称	描述
Common arguments	--connect	指定JDBC连接字符串
	--connection-manager	指定连接管理器类名
	--connection-param-file	指定连接参数文件
	--driver	手动指定要使用的JDBC驱动类
	--hadoop-home	覆盖$HADOOP_MAPRED_HOME_ARG
	--hadoop-mapred-home	覆盖$HADOOP_MAPRED_HOME_ARG
	--help	打印使用说明
	--metadata-transaction-isolation-level	定义元数据查询的事务隔离级别
	--oracle-escaping-disabled	禁用Oracle/OraOop连接管理器的转义机制
	-P	从控制台读取密码
	--password	设置认证密码
	--password-alias	凭据提供者密码别名
	--password-file	设置认证密码文件路径
	--relaxed-isolation	使用读未提交隔离进行导入
	--skip-dist-cache	跳过将jar复制到分布式缓存
	--temporary-rootdir	定义导入的临时根目录
	--throw-on-error	在作业期间发生错误时重新抛出RuntimeException
	--username	设置认证用户名
	--verbose	工作时打印更多信息
Import mainframe control arguments	--as-textfile	以纯文本形式导入数据（默认）
	--compression-codec	导入时使用的压缩编解码器
	--dataset	要导入的数据集
	--datasettype	数据集类型（p=分区数据集、s=顺序数据集、g=GDG）
	--delete-target-dir	以删除模式导入数据
	-m,--num-mappers	使用'n'个映射任务并行导入
	--mapreduce-job-name	为生成的mapreduce作业设置名称
	--tape	数据集在磁带上（true、false）
	--target-dir	HDFS纯文件目标目录
	--validate	使用配置的验证器进行验证
	--validation-failurehandler	验证失败处理程序的完全限定类名
	--validation-threshold	验证阈值的完全限定类名
	--validator	验证器的完全限定类名
	--warehouse-dir	文件目标的HDFS父目录
	-z,--compress	启用压缩
Output line formatting arguments	--enclosed-by	设置必需的字段包围字符
	--escaped-by	设置转义字符
	--fields-terminated-by	设置字段分隔符字符
	--lines-terminated-by	设置行结束字符
	--mysql-delimiters	使用MySQL的默认分隔符集
	--optionally-enclosed-by	设置字段包围字符
Input parsing arguments	--input-enclosed-by	设置必需的字段包围符
	--input-escaped-by	设置输入转义字符
	--input-fields-terminated-by	设置输入字段分隔符
	--input-lines-terminated-by	设置输入行结束符
	--input-optionally-enclosed-by	设置字段包围符
Hive arguments	--create-hive-table	如果目标hive表存在，则失败
	--external-table-dir	设置外部表在HDFS中的位置
	--hive-database	导入到hive时使用的数据库名称
	--hive-delims-replacement	用用户定义的字符串替换从导入的字符串字段中的Hive记录\0x01和行分隔符（\n\r）
	--hive-drop-import-delims	从导入的字符串字段中删除Hive记录\0x01和行分隔符（\n\r）
	--hive-home	覆盖$HIVE_HOME
	--hive-import	将表导入Hive
	--hive-overwrite	覆盖Hive表中的现有数据
	--hive-partition-key	导入到hive时使用的分区键
	--hive-partition-value	导入到hive时使用的分区值
	--hive-table	导入到hive时使用的表名
	--map-column-hive	覆盖特定列到hive类型的映射
HBase arguments	--column-family	设置导入的目标列族
	--hbase-bulkload	启用HBase批量加载
	--hbase-create-table	如果指定，则创建缺失的HBase表
	--hbase-row-key
指定哪个输入列用作行键
	--hbase-table	导入到HBase中的
HCatalog arguments	--hcatalog-database	HCatalog数据库名称
	--hcatalog-home	覆盖$HCAT_HOME
	--hcatalog-partition-keys	导入到hive时使用的分区键
	--hcatalog-partition-values	导入到hive时使用的分区值
	--hcatalog-table	HCatalog表名
	--map-column-hive	覆盖特定列到hive类型的映射
HCatalog import specific options	--create-hcatalog-table	导入前创建HCatalog
	--drop-and-create-hcatalog-table	导入前删除并创建HCatalog
	--hcatalog-storage-stanza	表创建时的HCatalog存储stanza
Accumulo arguments	--accumulo-batch-size	批处理大小（字节）
	--accumulo-column-family	设置导入的目标列族
	--accumulo-create-table	如果指定，则创建缺失的Accumulo表
	--accumulo-instance	Accumulo实例名称
	--accumulo-max-latency	最大写入延迟（毫秒）
	--accumulo-password	Accumulo密码
	--accumulo-row-key
指定哪个输入列用作行键
	--accumulo-table	导入到Accumulo中的
	--accumulo-user	Accumulo用户名
	--accumulo-visibility	应用于导入的所有行的可见性标记
	--accumulo-zookeepers	逗号分隔的zookeeper列表（主机:端口）
Code generation arguments	--bindir	编译对象的输出目录
	--class-name	设置生成的类名，这会覆盖--package-name。与--jar-file结合使用时，设置输入类
	--escape-mapping-column-names	禁用列名中的特殊字符转义
	--input-null-non-string	输入空值非字符串表示
	--input-null-string	输入空字符串表示
	--jar-file	禁用代码生成；使用指定的jar
	--map-column-java	覆盖特定列到java类型的映射
	--null-non-string	空值非字符串表示
	--null-string	空字符串表示
	--outdir	生成代码的输出目录
	--package-name	将自动生成的类放入此包中

job

参数描述

类别	参数/命令	描述
Job 管理参数	--create	创建一个新的保存作业
	--delete	删除一个保存的作业
	--exec	运行一个保存的作业
	--help	打印使用说明
	--list	列出保存的作业
	--meta-connect	指定用于元数据存储的JDBC连接字符串
	--show	显示保存作业的参数
	--verbose	在工作时打印更多信息
通用 Hadoop 命令行参数	-conf	指定应用程序配置文件
	-D <property=value>	为给定属性定义一个值
	-fs <file:///hdfs://namenode:port>	指定要使用的默认文件系统URL，覆盖配置中的'fs.defaultFS'属性
	-jt localresourcemanager:port	指定ResourceManager
	-files <file1,...>	指定要复制到MapReduce集群的逗号分隔的文件列表
	-libjars <jar1,...>	指定要包含在类路径中的逗号分隔的jar文件列表
	-archives <archive1,...>	指定要在计算机器上解压缩的逗号分隔的存档列表

list-tables

参数描述

类别	参数/命令	描述
Common 参数	--connect	指定JDBC连接字符串
	--connection-manager	指定连接管理器类名
	--connection-param-file	指定连接参数文件
	--driver	手动指定要使用的JDBC驱动类
	--hadoop-home	覆盖$HADOOP_HOME环境变量
	--hadoop-mapred-home	覆盖$HADOOP_MAPRED_HOME环境变量
	--help	打印使用说明
	--metadata-transaction-isolation-level	定义元数据查询的事务隔离级别
	--oracle-escaping-disabled	禁用Oracle/OraOop连接管理器的转义机制
	-P	从控制台读取密码
	--password	设置认证密码
	--password-alias	凭据提供者密码别名
	--password-file	设置认证密码文件路径
	--relaxed-isolation	使用读取未提交隔离级别进行导入
	--skip-dist-cache	跳过将jar复制到分布式缓存
	--temporary-rootdir	定义导入的临时根目录
	--throw-on-error	在作业发生错误时重新抛出RuntimeException
	--username	设置认证用户名
	--verbose	在工作时打印更多信息
通用 Hadoop 命令行参数	-conf	指定应用程序配置文件
	-D <property=value>	为给定属性定义一个值
	-fs <file:///hdfs://namenode:port>	指定要使用的默认文件系统URL
	-jt localresourcemanager:port	指定ResourceManager
	-files <file1,...>	指定要复制到MapReduce集群的逗号分隔的文件列表
	-libjars <jar1,...>	指定要包含在类路径中的逗号分隔的jar文件列表
	-archives <archive1,...>	指定要在计算机器上解压缩的逗号分隔的存档列表

merge

参数描述

参数名称	描述	示例
--class-name	指定要加载的记录类名称	--class-name com.example.MyClass
--help	打印使用说明	--help
--jar-file	从指定的jar文件中加载类	--jar-file /path/to/my.jar
--merge-key	用于连接结果的关键列	--merge-key id
--new-data	指向更近的数据集的路径	--new-data /user/hadoop/new_data
--onto	指向旧数据集的路径	--onto /user/hadoop/old_data
--target-dir	合并结果的目标路径	--target-dir /user/hadoop/merged_data
--verbose	工作时打印更多信息	--verbose
-conf	指定应用程序配置文件	-conf /path/to/config.file
-D	为给定属性定义一个值	-D mapreduce.job.queuename=default
-fs	指定要使用的默认文件系统URL，覆盖配置中的'fs.defaultFS'属性	-fs hdfs://namenode:8020
-jt	指定ResourceManager	-jt resourcemanager:8032
-files	指定要复制到MapReduce集群的逗号分隔的文件列表	-files /path/to/file1,/path/to/file2
-libjars	指定要包含在类路径中的逗号分隔的jar文件列表	-libjars /path/to/jar1,/path/to/jar2
-archives	指定要在计算机上解压缩的逗号分隔的存档列表	-archives /path/to/archive1,/path/to/archive2

Sqoop 指令语法手册

目录

help指令

list-databases

参数描述

示例

codegen

参数描述

Sqoop create-hive-table

参数描述

eval

参数描述

Export

参数描述

import

参数描述

import-all-tables

参数描述

import-mainframe

参数描述

job

参数描述

list-tables

参数描述

merge

参数描述