Atlas新增clickhouse类型(TYPE)并同步元数据

前言:类型(Type)系统:

Atlas允许用户为他们想要管理的元数据对象定义模型。该模型由称为"类型"的定义组成。称为"实体"的"类型"实例表示受管理的实际元数据对象。Type System是一个允许用户定义和管理类型和实体的组件。开箱即用的Atlas管理的所有元数据对象(例如Hive表)都使用类型建模并表示为实体。要在Atlas中存储新类型的元数据,需要了解类型系统组件的概念。

一、类型实体定义

  • 要想将clickhouse的元数据同步到Atlas中,首先需要定义clickhouse相关的类型(这里是参考了spark相关类型写的,具体属性可以根据自己公司实际情况进行调整,不一定所有属性都是有用的)

    • clickhouse_db类型:

    bash 复制代码
    curl -i -X POST -H "Content-Type: application/json" -d '{
        "enumTypes": [],
        "structTypes": [],
        "classificationDefs": [],
        "entityDefs": [
            {
          "category": "ENTITY",
          "version": 1,
          "name": "clickhouse_db",
          "description": "clickhouse_db",
          "typeVersion": "1.0",
          "serviceType": "clickhouse",
          "attributeDefs": [
            {
              "name": "location",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": 5
            },
            {
              "name": "clusterName",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": 8
            },
            {
              "name": "parameters",
              "typeName": "map<string,string>",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            },
            {
              "name": "ownerType",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            }
          ],
          "superTypes": [
            "DataSet"
          ],
          "subTypes": [],
          "relationshipAttributeDefs": [
            {
              "name": "inputToProcesses",
              "typeName": "array<Process>",
              "isOptional": true,
              "cardinality": "SET",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "dataset_process_inputs",
              "isLegacyAttribute": false
            },
            {
              "name": "schema",
              "typeName": "array<avro_schema>",
              "isOptional": true,
              "cardinality": "SET",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "avro_schema_associatedEntities",
              "isLegacyAttribute": false
            },
            {
              "name": "tables",
              "typeName": "array<clickhouse_table>",
              "isOptional": true,
              "cardinality": "SET",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "clickhouse_table_db",
              "isLegacyAttribute": false
            },
            {
              "name": "meanings",
              "typeName": "array<AtlasGlossaryTerm>",
              "isOptional": true,
              "cardinality": "SET",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "AtlasGlossarySemanticAssignment",
              "isLegacyAttribute": false
            },
            {
              "name": "outputFromProcesses",
              "typeName": "array<Process>",
              "isOptional": true,
              "cardinality": "SET",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "process_dataset_outputs",
              "isLegacyAttribute": false
            }
          ],
          "businessAttributeDefs": {}
        }
        ],
        "relationshipDefs": []
    }' --user admin:admin "http://localhost:21000/api/atlas/v2/types/typedefs"
    • clickhouse_table类型:

    bash 复制代码
    curl -i -X POST -H "Content-Type: application/json" -d '{
        "enumTypes": [],
        "structTypes": [],
        "classificationDefs": [],
        "entityDefs": [
            {
          "category": "ENTITY",
          "version": 1,
          "name": "clickhouse_table",
          "description": "clickhouse_table",
          "typeVersion": "1.0",
          "serviceType": "clickhouse",
          "attributeDefs": [
            {
              "name": "tableType",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            },
            {
              "name": "provider",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": 5
            },
            {
              "name": "partitionColumnNames",
              "typeName": "array<string>",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            },
            {
              "name": "bucketSpec",
              "typeName": "map<string,string>",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            },
            {
              "name": "ownerType",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            },
            {
              "name": "createTime",
              "typeName": "date",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            },
            {
              "name": "parameters",
              "typeName": "map<string,string>",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            },
            {
              "name": "comment",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": 9
            },
            {
              "name": "unsupportedFeatures",
              "typeName": "array<string>",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            },
            {
              "name": "viewOriginalText",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": 9
            },
            {
              "name": "schemaDesc",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": 5
            },
            {
              "name": "partitionProvider",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            }
          ],
          "superTypes": [
            "DataSet"
          ],
          "subTypes": [],
          "relationshipAttributeDefs": [
            {
              "name": "inputToProcesses",
              "typeName": "array<Process>",
              "isOptional": true,
              "cardinality": "SET",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "dataset_process_inputs",
              "isLegacyAttribute": false
            },
            {
              "name": "schema",
              "typeName": "array<avro_schema>",
              "isOptional": true,
              "cardinality": "SET",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "avro_schema_associatedEntities",
              "isLegacyAttribute": false
            },
            {
              "name": "sd",
              "typeName": "clickhouse_storagedesc",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "clickhouse_table_storagedesc",
              "isLegacyAttribute": false
            },
            {
              "name": "columns",
              "typeName": "array<clickhouse_column>",
              "isOptional": true,
              "cardinality": "SET",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "constraints": [
                {
                  "type": "ownedRef"
                }
              ],
              "relationshipTypeName": "clickhouse_table_columns",
              "isLegacyAttribute": false
            },
            {
              "name": "meanings",
              "typeName": "array<AtlasGlossaryTerm>",
              "isOptional": true,
              "cardinality": "SET",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "AtlasGlossarySemanticAssignment",
              "isLegacyAttribute": false
            },
            {
              "name": "db",
              "typeName": "clickhouse_db",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "clickhouse_table_db",
              "isLegacyAttribute": false
            },
            {
              "name": "outputFromProcesses",
              "typeName": "array<Process>",
              "isOptional": true,
              "cardinality": "SET",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "process_dataset_outputs",
              "isLegacyAttribute": false
            }
          ]
        }
        ],
        "relationshipDefs": []
    }' --user admin:admin "http://localhost:21000/api/atlas/v2/types/typedefs"
    • clickhouse_column类型:

    bash 复制代码
    curl -i -X POST -H "Content-Type: application/json" -d '{
        "enumTypes": [],
        "structTypes": [],
        "classificationDefs": [],
        "entityDefs": [
            {
          "category": "ENTITY",
          "version": 1,
          "name": "clickhouse_column",
          "description": "clickhouse_column",
          "typeVersion": "1.0",
          "serviceType": "clickhouse",
          "attributeDefs": [
            {
              "name": "type",
              "typeName": "string",
              "isOptional": false,
              "cardinality": "SINGLE",
              "valuesMinCount": 1,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": true,
              "includeInNotification": false,
              "searchWeight": -1
            },
            {
              "name": "nullable",
              "typeName": "boolean",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            },
            {
              "name": "metadata",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            },
            {
              "name": "comment",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": 9
            }
          ],
          "superTypes": [
            "DataSet"
          ],
          "subTypes": [],
          "relationshipAttributeDefs": [
            {
              "name": "inputToProcesses",
              "typeName": "array<Process>",
              "isOptional": true,
              "cardinality": "SET",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "dataset_process_inputs",
              "isLegacyAttribute": false
            },
            {
              "name": "schema",
              "typeName": "array<avro_schema>",
              "isOptional": true,
              "cardinality": "SET",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "avro_schema_associatedEntities",
              "isLegacyAttribute": false
            },
            {
              "name": "meanings",
              "typeName": "array<AtlasGlossaryTerm>",
              "isOptional": true,
              "cardinality": "SET",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "AtlasGlossarySemanticAssignment",
              "isLegacyAttribute": false
            },
            {
              "name": "table",
              "typeName": "clickhouse_table",
              "isOptional": false,
              "cardinality": "SINGLE",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "clickhouse_table_columns",
              "isLegacyAttribute": false
            },
            {
              "name": "outputFromProcesses",
              "typeName": "array<Process>",
              "isOptional": true,
              "cardinality": "SET",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "process_dataset_outputs",
              "isLegacyAttribute": false
            }
          ],
          "businessAttributeDefs": {
            "Description": [
              {
                "name": "index_desc",
                "typeName": "string",
                "isOptional": true,
                "cardinality": "SINGLE",
                "valuesMinCount": 0,
                "valuesMaxCount": 1,
                "isUnique": false,
                "isIndexable": true,
                "includeInNotification": false,
                "searchWeight": 5,
                "options": {
                  "applicableEntityTypes": "[\"hive_column\",\"clickhouse_column\",\"clickhouse_column\"]",
                  "maxStrLength": "10000"
                }
              }
            ]
          }
        }
        ],
        "relationshipDefs": []
    }' --user admin:admin "http://localhost:21000/api/atlas/v2/types/typedefs"
    • clickhouse_storagedesc类型:

    bash 复制代码
    curl -i -X POST -H "Content-Type: application/json" -d '{
        "enumTypes": [],
        "structTypes": [],
        "classificationDefs": [],
        "entityDefs": [
            {
          "category": "ENTITY",
          "version": 1,
          "name": "clickhouse_storagedesc",
          "description": "clickhouse_storagedesc",
          "typeVersion": "1.0",
          "serviceType": "clickhouse",
          "attributeDefs": [
            {
              "name": "location",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": 10
            },
            {
              "name": "inputFormat",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            },
            {
              "name": "outputFormat",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            },
            {
              "name": "serde",
              "typeName": "string",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            },
            {
              "name": "compressed",
              "typeName": "boolean",
              "isOptional": false,
              "cardinality": "SINGLE",
              "valuesMinCount": 1,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": true,
              "includeInNotification": false,
              "searchWeight": -1
            },
            {
              "name": "parameters",
              "typeName": "map<string,string>",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": 0,
              "valuesMaxCount": 1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1
            }
          ],
          "superTypes": [
            "Referenceable"
          ],
          "subTypes": [],
          "relationshipAttributeDefs": [
            {
              "name": "meanings",
              "typeName": "array<AtlasGlossaryTerm>",
              "isOptional": true,
              "cardinality": "SET",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "AtlasGlossarySemanticAssignment",
              "isLegacyAttribute": false
            },
            {
              "name": "table",
              "typeName": "clickhouse_table",
              "isOptional": true,
              "cardinality": "SINGLE",
              "valuesMinCount": -1,
              "valuesMaxCount": -1,
              "isUnique": false,
              "isIndexable": false,
              "includeInNotification": false,
              "searchWeight": -1,
              "relationshipTypeName": "clickhouse_table_storagedesc",
              "isLegacyAttribute": false
            }
          ],
          "businessAttributeDefs": {}
        }
        ],
        "relationshipDefs": []
    }' --user admin:admin "http://localhost:21000/api/atlas/v2/types/typedefs"

二、定义关系类型

这一步很关键,不然创建了实体以后,实体之间是建立不起来联系的,比如从表名跳转不到列名上

sql 复制代码
#/v2/types/typedefs
{
  "entityDefs": [],
  "classificationDefs": [],
  "structDefs": [],
  "enumDefs": [],
  "relationshipDefs": [
    {
      "category": "RELATIONSHIP",
      "version": 1,
      "name": "clickhouse_table_db",
      "description": "clickhouse_table_db",
      "typeVersion": "1.0",
      "serviceType": "clickhouse",
      "attributeDefs": [],
      "relationshipCategory": "AGGREGATION",
      "propagateTags": "NONE",
      "endDef1": {
        "type": "clickhouse_table",
        "name": "db",
        "isContainer": false,
        "cardinality": "SINGLE",
        "isLegacyAttribute": false
      },
      "endDef2": {
        "type": "clickhouse_db",
        "name": "tables",
        "isContainer": true,
        "cardinality": "SET",
        "isLegacyAttribute": false
      }
    },
    {
      "category": "RELATIONSHIP",
      "version": 1,
      "name": "clickhouse_table_columns",
      "description": "clickhouse_table_columns",
      "typeVersion": "1.0",
      "serviceType": "clickhouse",
      "attributeDefs": [],
      "relationshipCategory": "COMPOSITION",
      "propagateTags": "NONE",
      "endDef1": {
        "type": "clickhouse_table",
        "name": "columns",
        "isContainer": true,
        "cardinality": "SET",
        "isLegacyAttribute": false
      },
      "endDef2": {
        "type": "clickhouse_column",
        "name": "table",
        "isContainer": false,
        "cardinality": "SINGLE",
        "isLegacyAttribute": false
      }
    },
    {
      "category": "RELATIONSHIP",
      "version": 1,
      "name": "clickhouse_table_storagedesc",
      "description": "clickhouse_table_storagedesc",
      "typeVersion": "1.0",
      "serviceType": "clickhouse",
      "attributeDefs": [],
      "relationshipCategory": "ASSOCIATION",
      "propagateTags": "NONE",
      "endDef1": {
        "type": "clickhouse_table",
        "name": "sd",
        "isContainer": false,
        "cardinality": "SINGLE",
        "isLegacyAttribute": false
      },
      "endDef2": {
        "type": "clickhouse_storagedesc",
        "name": "table",
        "isContainer": false,
        "cardinality": "SINGLE",
        "isLegacyAttribute": false
      }
    }
  ]
}

三、采集元数据

这部分这里就不详细说明了,实现方法有非常多,由于clickhouse元数据不涉及太多血缘,所以我们直接用clickhouse自带的元数据做了一张表,里面包含了一些主要的信息,例如库名、表名、字段英文名、字段中文名等,我给出一个简单的sql,你们可以根据自己实际情况调整,或者自己写一个hook工具都可以

sql 复制代码
select
        database `db_name`,
        table `tbl_name`,
        name `column_name`,
        type `column_type`,
        default_expression `is_nullable`,
        is_in_partition_key `partition_key`,
        is_in_primary_key `column_key`,
        comment `column_comment`,
        position `column_position`
    from system.columns

四、同步元数据

同步元数据有两种方式,一种是Atlas自带的API,另一种是往Atlas里的Kafka写消息,下面我分别介绍:

1.自带API

API文档可以从以下路径找:

找到这个API后,点"try it out",输入以下JSON:

sql 复制代码
  {"entities": [
                {
                    "typeName": "clickhouse_table",
                    "attributes": {
                        "owner": "bi",
                        "ownerType": "USER",
                        "sd": {
                            "typeName": "clickhouse_storagedesc",
                            "attributes": {
                                "qualifiedName": "bi_app.wuxl_0316_rr@primary_storage",
                                "name": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                                "location": "hdfs://HDFS80727/bi/bi_app.db/wuxl_0316_rr",
                                "compressed": false,
                                "inputFormat": "org.apache.hadoop.mapred.TextInputFormat",
                                "outputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
                                "parameters": {
                                    "serialization.format": "1"
                                }
                            },
                            "guid": "-28224574948884002",
                            "version": 0,
                            "proxy": false
                        },
                        "tableType": "MANAGED",
                        "createTime": 1709003223000,
                        "qualifiedName": "bi_app.wuxl_0316_rr@primary",
                        "columns": [
                            {
                                "typeName": "clickhouse_column",
                                "attributes": {
                                    "qualifiedName": "bi_app.wuxl_0316_rr.column1@primary",
                                    "name": "column1",
                                    "comment": "ziduan1",
                                    "type": "string",
                                    "table": {
                                        "typeName": "clickhouse_table",
                                        "attributes": {
                                            "qualifiedName": "bi_app.wuxl_0316_rr@primary"
                                        },
                                        "guid": "-28224574948884003",
                                        "version": 0,
                                        "proxy": false
                                    }
                                },
                                "guid": "-28224574948884005",
                                "version": 0,
                                "proxy": false
                            },
                            {
                                "typeName": "clickhouse_column",
                                "attributes": {
                                    "qualifiedName": "bi_app.wuxl_0316_rr.column2@primary",
                                    "name": "column2",
                                    "comment": "ziduan2",
                                    "type": "string",
                                    "table": {
                                        "typeName": "clickhouse_table",
                                        "attributes": {
                                            "qualifiedName": "bi_app.wuxl_0316_rr@primary"
                                        },
                                        "guid": "-28224574948884003",
                                        "version": 0,
                                        "proxy": false
                                    }
                                },
                                "guid": "-28224574948884006",
                                "version": 0,
                                "proxy": false
                            }
                        ],
                        "name": "wuxl_0316_rr",
                        "comment": "测试表",
                        "parameters": {
                            "transient_lastDdlTime": "1709003223"
                        },
                        "db": {
                            "typeName": "clickhouse_db",
                            "attributes": {
                                "owner": "bi",
                                "ownerType": "USER",
                                "qualifiedName": "bi_app@primary",
                                "clusterName": "primary",
                                "name": "bi_app",
                                "description": "",
                                "location": "hdfs://HDFS80727/bi/bi_app.db",
                                "parameters": {

                                }
                            },
                            "guid": "-28224574948884001",
                            "version": 0,
                            "proxy": false
                        }
                    },
                    "guid": "-28224574948884003",
                    "version": 0,
                    "proxy": false,
                    "relationships":{
                        "typeName":"clickhouse_table_db",
                        "db":["-28224574948884001"],
                        "end1": {
                          "typeName": "clickhouse_table",
                          "guid": "-28224574948884003"
                        },
                        "end2": {
                          "typeName": "clickhouse_db",
                          "guid": "-28224574948884001"
                        }
                    }
                },
                {
                    "typeName": "clickhouse_db",
                    "attributes": {
                        "owner": "bi",
                        "ownerType": "USER",
                        "qualifiedName": "bi_app@primary",
                        "clusterName": "primary",
                        "name": "bi_app",
                        "description": "",
                        "location": "hdfs://HDFS80727/bi/bi_app.db",
                        "parameters": {

                        }
                    },
                    "guid": "-28224574948884001",
                    "version": 0,
                    "proxy": false,
                    "relationships":{}
                },
                {
                    "typeName": "clickhouse_storagedesc",
                    "attributes": {
                        "qualifiedName": "bi_app.wuxl_0316_rr@primary_storage",
                        "name": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
                        "location": "hdfs://HDFS80727/bi/bi_app.db/wuxl_0316_rr",
                        "compressed": false,
                        "inputFormat": "org.apache.hadoop.mapred.TextInputFormat",
                        "outputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
                        "parameters": {
                            "serialization.format": "1"
                        }
                    },
                    "guid": "-28224574948884002",
                    "version": 0,
                    "proxy": false,
                    "relationships":{
                        "typeName":"clickhouse_table_storagedesc",
                        "table":["-28224574948884003"],
                        "end1": {
                          "typeName": "clickhouse_storagedesc",
                          "guid": "-28224574948884002"
                        },
                        "end2": {
                          "typeName": "clickhouse_table",
                          "guid": "-28224574948884003"
                        }
                    }
                },
                {
                    "typeName": "clickhouse_column",
                    "attributes": {
                        "qualifiedName": "bi_app.wuxl_0316_rr.column1@primary",
                        "name": "column1",
                        "comment": "ziduan1",
                        "type": "string",
                        "table": {
                            "typeName": "clickhouse_table",
                            "attributes": {
                                "qualifiedName": "bi_app.wuxl_0316_rr@primary"
                            },
                            "guid": "-28224574948884003",
                            "version": 0,
                            "proxy": false
                        }
                    },
                    "guid": "-28224574948884005",
                    "version": 0,
                    "proxy": false,
                    "relationships":{
                        "typeName":"clickhouse_table_columns",
                        "table":["-28224574948884003"],
                        "end1": {
                          "typeName": "clickhouse_column",
                          "guid": "-28224574948884005"
                        },
                        "end2": {
                          "typeName": "clickhouse_table",
                          "guid": "-28224574948884003"
                        }
                    }
                },
                {
                    "typeName": "clickhouse_column",
                    "attributes": {
                        "qualifiedName": "bi_app.wuxl_0316_rr.column2@primary",
                        "name": "column2",
                        "comment": "ziduan2",
                        "type": "string",
                        "table": {
                            "typeName": "clickhouse_table",
                            "attributes": {
                                "qualifiedName": "bi_app.wuxl_0316_rr@primary"
                            },
                            "guid": "-28224574948884003",
                            "version": 0,
                            "proxy": false
                        }
                    },
                    "guid": "-28224574948884006",
                    "version": 0,
                    "proxy": false,
                    "relationships":{
                        "typeName":"clickhouse_table_columns",
                        "table":["-28224574948884003"],
                        "end1": {
                          "typeName": "clickhouse_column",
                          "guid": "-28224574948884006"
                        },
                        "end2": {
                          "typeName": "clickhouse_table",
                          "guid": "-28224574948884003"
                        }
                    }
                }
            ]}
2.Kafka

直接往Atlas自带的"ATLAS_HOOK" topic里写消息,atlas会解析并创建实体和实体间的关系

sql 复制代码
-- 使用Flinksql往Atlas自带的topic里写消息
CREATE TABLE ads_zdm_offsite_platform_daren_rank_df_to_kafka (
        data string
) WITH (
  'connector' = 'kafka',
  'topic' = 'ATLAS_HOOK',
  'properties.bootstrap.servers' = 'localhost:9092', 
  'format' = 'raw'
);

insert into ads_zdm_offsite_platform_daren_rank_df_to_kafka
select '{"version":{"version":"1.0.0","versionParts":[1]},"msgCompressionKind":"NONE","msgSplitIdx":1,"msgSplitCount":1,"msgSourceIP":"10.45.1.116","msgCreatedBy":"bi","msgCreationTime":1710575827820,"message":{"type":"ENTITY_CREATE_V2","user":"bi","entities":{"entities":[{"typeName":"clickhouse_table","attributes":{"owner":"bi","ownerType":"USER","sd":{"typeName":"clickhouse_storagedesc","attributes":{"qualifiedName":"test.wuxl_0316_ss@primary_storage","name":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","location":"hdfs://HDFS80727/bi/test.db/wuxl_0316_ss","compressed":false,"inputFormat":"org.apache.hadoop.mapred.TextInputFormat","outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","parameters":{"serialization.format":"1"}},"guid":"-861237351166887","version":0,"proxy":false},"tableType":"MANAGED","createTime":1710575827000,"qualifiedName":"test.wuxl_0316_ss@primary","columns":[{"typeName":"clickhouse_column","attributes":{"qualifiedName":"test.wuxl_0316_ss.column_tt_1@primary","name":"column_tt_1","comment":"测试字段1","type":"string","table":{"typeName":"clickhouse_table","attributes":{"qualifiedName":"test.wuxl_0316_ss@primary"},"guid":"-861237351166888","version":0,"proxy":false}},"guid":"-861237351166890","version":0,"proxy":false},{"typeName":"clickhouse_column","attributes":{"qualifiedName":"test.wuxl_0316_ss.column_tt_2@primary","name":"column_tt_2","comment":"测试字段2","type":"string","table":{"typeName":"clickhouse_table","attributes":{"qualifiedName":"test.wuxl_0316_ss@primary"},"guid":"-861237351166888","version":0,"proxy":false}},"guid":"-861237351166891","version":0,"proxy":false}],"name":"wuxl_0316_ss","comment":"测试表","parameters":{"transient_lastDdlTime":"1710575827"},"db":{"typeName":"clickhouse_db","attributes":{"owner":"bi","ownerType":"USER","qualifiedName":"test@primary","clusterName":"primary","name":"test","description":"","location":"hdfs://HDFS80727/bi/test.db","parameters":{}},"guid":"-861237351166886","version":0,"proxy":false}},"guid":"-861237351166888","version":0,"proxy":false},{"typeName":"clickhouse_db","attributes":{"owner":"bi","ownerType":"USER","qualifiedName":"test@primary","clusterName":"primary","name":"test","description":"","location":"hdfs://HDFS80727/bi/test.db","parameters":{}},"guid":"-861237351166886","version":0,"proxy":false},{"typeName":"clickhouse_storagedesc","attributes":{"qualifiedName":"test.wuxl_0316_ss@primary_storage","name":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","location":"hdfs://HDFS80727/bi/test.db/wuxl_0316_ss","compressed":false,"inputFormat":"org.apache.hadoop.mapred.TextInputFormat","outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","parameters":{"serialization.format":"1"}},"guid":"-861237351166887","version":0,"proxy":false},{"typeName":"clickhouse_column","attributes":{"qualifiedName":"test.wuxl_0316_ss.column_tt_1@primary","name":"column_tt_1","comment":"测试字段1","type":"string","table":{"typeName":"clickhouse_table","attributes":{"qualifiedName":"test.wuxl_0316_ss@primary"},"guid":"-861237351166888","version":0,"proxy":false}},"guid":"-861237351166890","version":0,"proxy":false},{"typeName":"clickhouse_column","attributes":{"qualifiedName":"test.wuxl_0316_ss.column_tt_2@primary","name":"column_tt_2","comment":"测试字段2","type":"string","table":{"typeName":"clickhouse_table","attributes":{"qualifiedName":"test.wuxl_0316_ss@primary"},"guid":"-861237351166888","version":0,"proxy":false}},"guid":"-861237351166891","version":0,"proxy":false}]}}}' as data
;

五、其他说明

要注意第四步里边的guid,由于是新建的实体,Atlas还没有为其创建guid(全局唯一ID),所以这里可以自己生成一个"-xxxxxxxxx"的id,生成的这个id不会写到Atlas里,这个的作用是用来表述实体间的关系的,比如字段对应的表是哪个,在创建字段实体的时候,就要指定它对应的表的虚拟guid,这样Atlas在创建的时候就会创建对应的关系

相关推荐
DeepSeek大模型官方教程35 分钟前
NLP之文本纠错开源大模型:兼看语音大模型总结
大数据·人工智能·ai·自然语言处理·大模型·产品经理·大模型学习
大数据CLUB2 小时前
基于spark的奥运会奖牌变化数据分析
大数据·hadoop·数据分析·spark
Edingbrugh.南空2 小时前
Hadoop高可用集群搭建
大数据·hadoop·分布式
智慧化智能化数字化方案2 小时前
69页全面预算管理体系的框架与落地【附全文阅读】
大数据·人工智能·全面预算管理·智慧财务·智慧预算
武子康3 小时前
大数据-33 HBase 整体架构 HMaster HRegion
大数据·后端·hbase
Edingbrugh.南空4 小时前
Flink ClickHouse 连接器维表源码深度解析
java·clickhouse·flink
好开心啊没烦恼7 小时前
Python 数据分析:计算,分组统计1,df.groupby()。听故事学知识点怎么这么容易?
开发语言·python·数据挖掘·数据分析·pandas
诗旸的技术记录与分享16 小时前
Flink-1.19.0源码详解-番外补充3-StreamGraph图
大数据·flink
资讯分享周16 小时前
Alpha系统联结大数据、GPT两大功能,助力律所管理降本增效
大数据·gpt
G皮T17 小时前
【Elasticsearch】深度分页及其替代方案
大数据·elasticsearch·搜索引擎·scroll·检索·深度分页·search_after