简介:
Knox是一个提供认证和访问集群中hadoop服务的单个端点服务。目标是为用户和操作者简化hadoop安全。knox运行为一个服务或者集群服务,并提供集中访问一个或者多个hadoop集群。通常网关的目标如下:
1、为hadoop rest api 提供外层的安全使hadoop 安全更容易设置和使用。
在外层提供认证和token 验证
确保认证能够和企业、云身份认证系统集成
在外层提供服务层级的鉴权
2、暴露单个url用来聚合hadoop集群的rest api
限制需要访问hadoop集群的网络端点
对潜在的攻击者隐藏内部Hadoop集群拓扑
knox 详解:
knox详解主要讲一下三点:
1、url是如何在服务多个Hadoop集群的网关和集群本身之间映射的
2、如何通过gateway-site.xml和特定于集群的拓扑文件配置网关
3、如何配置各种策略实施提供程序特性,如身份验证、授权、审计、主机映射等。
URL mapping
网关的功能很像反向代理。因此,它维护网关对外公开的url到Hadoop集群提供的url的映射
default Topology url
为了提供与Hadoop Java客户端和现有CLI工具的兼容性,Knox网关提供了一个称为默认拓扑的特性。这指的是一种拓扑部署,它将能够路由url,而无需网关用于区分一个Hadoop集群到另一个Hadoop集群的额外上下文。这允许url与那些可能通过Hadoop文件系统抽象访问WebHDFS的现有客户端使用的url相匹配。
当使用与配置的默认拓扑名称匹配的文件名部署拓扑文件时,将为该特定拓扑安装专门的url映射。这允许WebHDFS的现有Hadoop cli所期望的url用于与默认拓扑文件表示的特定Hadoop集群进行交互。
The configuration for the default topology name is found in gateway-site.xml
as a property called: default.app.topology.name
.
The default value for this property is empty.
When deploying the sandbox.xml
topology and setting default.app.topology.name
to sandbox
, both of the following example URLs work for the same underlying Hadoop cluster:
https://{gateway-host}:{gateway-port}/webhdfs
https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/webhdfs
These default topology URLs exist for all of the services in the topology.
Fully Qualified URLs
这些映射是由网关配置文件(例如{GATEWAY_HOME}/conf/gateway-site.xml)和集群拓扑描述符(例如{GATEWAY_HOME}/conf/ topology /{cluster-name}.xml)的组合生成的。集群url显示的端口号表示这些服务的默认端口。对于给定的集群,实际端口号可能不同。
- WebHDFS
- Gateway:
https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/webhdfs
- Cluster:
http://{webhdfs-host}:50070/webhdfs
- Gateway:
The values for {gateway-host}
, {gateway-port}
, {gateway-path}
are provided via the gateway configuration file (i.e. {GATEWAY_HOME}/conf/gateway-site.xml
).
The value for {cluster-name}
is derived from the file name of the cluster topology descriptor (e.g. {GATEWAY_HOME}/deployments/{cluster-name}.xml
)
The value for {webhdfs-host}
, {webhcat-host}
, {oozie-host}
, {hbase-host}
and {hive-host}
are provided via the cluster topology descriptor (e.g. {GATEWAY_HOME}/conf/topologies/{cluster-name}.xml
).
Topology Port Mapping
此特性允许将拓扑映射到端口,因此可以让特定拓扑专门侦听已配置的端口。该特性将url路由到这些端口映射的拓扑,而不需要网关用于区分一个Hadoop集群到另一个Hadoop集群的额外上下文,就像默认拓扑url特性一样,但在专用的端口上
The configuration for Topology Port Mapping goes in gateway-site.xml
file. The configuration uses the property name and value model. The format for the property name is gateway.port.mapping.{topologyName}
and value is the port number that this topology will listen on.
In the following example, the topology development
will listen on 9443 (if the port is not already taken).
<property>
<name>gateway.port.mapping.development</name>
<value>9443</value>
<description>Topology and Port mapping</description>
</property>
An example of how one can access WebHDFS URL using the above configuration is
https://{gateway-host}:9443/webhdfs
https://{gateway-host}:9443/{gateway-path}/development/webhdfs
All of the above URL will be valid URLs for the above described configuration.
This feature is turned on by default. Use the property gateway.port.mapping.enabled
to turn it on/off. e.g.
<property>
<name>gateway.port.mapping.enabled</name>
<value>true</value>
<description>Enable/Disable port mapping feature.</description>
</property>
If a topology mapped port is in use by another topology or a process, an ERROR message is logged and gateway startup continues as normal. Default gateway port cannot be used for port mapping, use Default Topology URLs feature instead.