数据仓库笔记 第四篇:Star Schema 层(维度建模)
什么是 摘要
Star Schema(星型模型)是 Ralph Kimball 提出的维度建模方法,是数据仓库面向分析的核心层 。
此笔记使用的数据库为SQLServer,相应的示例脚本都围绕于此,其它数据库的相应实现会略有不同。
Star Schema的结构像一颗星星:
┌─────────────┐
│ 事实表 │ ← 中央(星星中心)
│ FACT_XXX │ 存储业务度量(销售额、数量等)
│ │ 多维外键指向维度表
└──────┬──────┘
│
┌─────────────────┼─────────────────┐
│ │ │
↓ ↓ ↓
┌─────────┐ ┌─────────┐ ┌─────────┐
│ 时间维度│ │ 客户维度│ │ 商品维度│ ← 维度表(星星角)
│DIM_DATE │ │DIM_CUST │ │DIM_PROD │ 描述"谁/什么/何时/何地"
└─────────┘ └─────────┘ └─────────┘
如果你看Power BI教程,很多都是在演示从这一层取数据,也就是各种实施表和维度表。我个人不是非常建议从这一层来取数,而是从下一层取数,具体的细节可以参考下一篇关于数据集市层的描述。
核心概念
事实表(Fact Table)
存储业务的可度量数值(Measures),是数据分析的核心。
特征:
✓ 以数值型度量为主(销售额、订单数量、利润率)
✓ 包含多个外键,指向维度表
✓ 通常数据量巨大(百万~亿级)
✓ 支持聚合运算(SUM、COUNT、AVG 等)
常见事实表类型:
| 类型 | 说明 | 示例 |
|---|---|---|
| 事务事实表 | 每笔业务事件一行 | 销售订单、退货记录 |
| 周期快照表 | 按固定周期快照 | 每日库存余额、月末账户余额 |
| 累计快照表 | 记录流程全过程 | 订单生命周期(下单→支付→发货→签收) |
维度表(Dimension Table)
提供分析业务的角度(谁、什么、何时、何地、为什么)。
特征:
✓ 以文本型描述为主
✓ 行数相对较少(通常几百~几万)
✓ 包含丰富的描述性属性
✓ 被多个事实表共享
SQL 实战:创建 Star Schema 层
日期维度表
sql
-- ============================================================
-- 在 star_db 中创建维度表
-- ============================================================
USE star_db;
GO
-- 日期维度表
IF OBJECT_ID('dbo.dim_date', 'U') IS NOT NULL DROP TABLE dbo.dim_date;
GO
CREATE TABLE dbo.dim_date (
date_key INT NOT NULL PRIMARY KEY, -- 格式:YYYYMMDD
date_value DATE NOT NULL UNIQUE,
year SMALLINT NOT NULL,
quarter SMALLINT NOT NULL, -- 1-4
quarter_name VARCHAR(10) NOT NULL, -- Q1/Q2/Q3/Q4
month SMALLINT NOT NULL, -- 1-12
month_name NVARCHAR(20) NOT NULL,
month_name_short VARCHAR(10) NOT NULL,
week_of_year SMALLINT NOT NULL,
day_of_week SMALLINT NOT NULL, -- 1-7
day_name NVARCHAR(10) NOT NULL,
day_of_month SMALLINT NOT NULL, -- 1-31
day_of_year SMALLINT NOT NULL, -- 1-366
is_weekend BIT NOT NULL,
is_holiday BIT DEFAULT 0,
fiscal_year SMALLINT NULL,
fiscal_quarter SMALLINT NULL
);
GO
CREATE NONCLUSTERED INDEX idx_dim_date_year ON dbo.dim_date(year);
CREATE NONCLUSTERED INDEX idx_dim_date_month ON dbo.dim_date(year, month);
GO
填充日期维度数据(2023-01-01 ~ 2025-12-31):
sql
-- ============================================================
-- 填充日期维度(使用存储过程循环)
-- ============================================================
USE star_db;
GO
IF OBJECT_ID('dbo.sp_populate_dim_date', 'P') IS NOT NULL
DROP PROCEDURE dbo.sp_populate_dim_date;
GO
CREATE PROCEDURE dbo.sp_populate_dim_date
@start_date DATE = '2023-01-01',
@end_date DATE = '2025-12-31'
AS
BEGIN
SET NOCOUNT ON;
DECLARE @cur_date DATE = @start_date;
DECLARE @y INT, @m INT, @q INT;
WHILE @cur_date <= @end_date
BEGIN
SET @y = YEAR(@cur_date);
SET @m = MONTH(@cur_date);
SET @q = CAST(CEILING(CAST(@m AS FLOAT) / 3) AS SMALLINT);
INSERT INTO dbo.dim_date (
date_key, date_value, year, quarter, quarter_name,
month, month_name, month_name_short,
week_of_year, day_of_week, day_name,
day_of_month, day_of_year, is_weekend,
fiscal_year, fiscal_quarter
)
VALUES (
@y * 10000 + @m * 100 + DAY(@cur_date), -- date_key: YYYYMMDD
@cur_date,
@y,
@q,
'Q' + CAST(@q AS VARCHAR),
@m,
DATENAME(month, @cur_date), -- January, February...
LEFT(DATENAME(month, @cur_date), 3), -- Jan, Feb...
DATEPART(week, @cur_date),
DATEPART(weekday, @cur_date), -- 1=Sunday, 7=Saturday (默认)
DATENAME(weekday, @cur_date), -- Monday, Tuesday...
DAY(@cur_date),
DATEPART(dayofyear, @cur_date),
CASE WHEN DATEPART(weekday, @cur_date) IN (1, 7) THEN 1 ELSE 0 END,
@y, -- 财年默认同自然年
@q
);
SET @cur_date = DATEADD(day, 1, @cur_date);
END;
PRINT N'日期维度填充完成: ' + CAST(@start_date AS VARCHAR) + N' ~ ' + CAST(@end_date AS VARCHAR);
END;
GO
-- 执行填充
EXEC dbo.sp_populate_dim_date;
GO
-- 验证
SELECT COUNT(*) AS total_dates,
MIN(date_value) AS min_date,
MAX(date_value) AS max_date
FROM dbo.dim_date;
GO
客户维度表(SCD Type 2)
sql
USE star_db;
GO
-- 客户维度表
IF OBJECT_ID('dbo.dim_customer', 'U') IS NOT NULL DROP TABLE dbo.dim_customer;
GO
CREATE TABLE dbo.dim_customer (
customer_key BIGINT IDENTITY(1,1) PRIMARY KEY,
customer_id VARCHAR(50) NOT NULL,
-- SCD 字段
load_date DATETIME NOT NULL,
end_date DATETIME NULL, -- NULL = 当前有效
is_current BIT DEFAULT 1,
-- 业务属性
customer_name NVARCHAR(100) NOT NULL,
customer_type VARCHAR(20) NULL, -- individual/enterprise
email VARCHAR(100) NULL,
phone VARCHAR(20) NULL,
address NVARCHAR(200) NULL,
city NVARCHAR(50) NULL,
region NVARCHAR(50) NULL,
country NVARCHAR(50) DEFAULT N'中国',
register_date DATE NULL,
is_active BIT NULL
);
GO
CREATE NONCLUSTERED INDEX idx_dim_customer_id ON dbo.dim_customer(customer_id);
CREATE NONCLUSTERED INDEX idx_dim_customer_curr ON dbo.dim_customer(customer_id)
WHERE is_current = 1;
GO
商品维度表(SCD Type 2)
sql
USE star_db;
GO
-- 商品维度表
IF OBJECT_ID('dbo.dim_product', 'U') IS NOT NULL DROP TABLE dbo.dim_product;
GO
CREATE TABLE dbo.dim_product (
product_key BIGINT IDENTITY(1,1) PRIMARY KEY,
product_id VARCHAR(50) NOT NULL,
load_date DATETIME NOT NULL,
end_date DATETIME NULL,
is_current BIT DEFAULT 1,
product_name NVARCHAR(200) NOT NULL,
category NVARCHAR(50) NOT NULL,
sub_category NVARCHAR(50) NULL,
brand NVARCHAR(50) NULL,
supplier_id VARCHAR(50) NULL,
unit_cost DECIMAL(10,2) NULL,
unit_price DECIMAL(10,2) NULL,
is_active BIT NULL
);
GO
CREATE NONCLUSTERED INDEX idx_dim_product_id ON dbo.dim_product(product_id);
CREATE NONCLUSTERED INDEX idx_dim_product_curr ON dbo.dim_product(product_id)
WHERE is_current = 1;
CREATE NONCLUSTERED INDEX idx_dim_product_cat ON dbo.dim_product(category)
WHERE is_current = 1;
GO
订单状态维度表(退化维度)
sql
USE star_db;
GO
-- 订单状态维度(小型退化维度)
IF OBJECT_ID('dbo.dim_order_status', 'U') IS NOT NULL DROP TABLE dbo.dim_order_status;
GO
CREATE TABLE dbo.dim_order_status (
status_key INT NOT NULL PRIMARY KEY,
status_code VARCHAR(20) NOT NULL UNIQUE,
status_name NVARCHAR(50) NOT NULL,
is_active_status BIT NOT NULL -- 是否为活跃状态
);
GO
INSERT INTO dbo.dim_order_status (status_key, status_code, status_name, is_active_status) VALUES
(1, 'pending', N'待处理', 1),
(2, 'confirmed', N'已确认', 1),
(3, 'shipped', N'已发货', 1),
(4, 'cancelled', N'已取消', 0),
(5, 'returned', N'已退货', 0);
GO
销售事实表
sql
USE star_db;
GO
-- 销售事实表(事务事实表)
IF OBJECT_ID('dbo.fact_sales', 'U') IS NOT NULL DROP TABLE dbo.fact_sales;
GO
CREATE TABLE dbo.fact_sales (
fact_sales_id BIGINT IDENTITY(1,1) PRIMARY KEY,
-- 代理键(引用维度表)
date_key INT NOT NULL,
customer_key BIGINT NULL,
product_key BIGINT NULL,
status_key INT NULL,
-- 退化维度
order_id VARCHAR(50) NOT NULL,
-- 度量值
quantity INT NOT NULL,
unit_price DECIMAL(10,2) NOT NULL,
total_amount DECIMAL(14,2) NOT NULL,
discount_amount DECIMAL(10,2) DEFAULT 0,
-- 成本
unit_cost DECIMAL(10,2) NULL,
total_cost DECIMAL(14,2) NULL,
-- 衍生指标
gross_profit DECIMAL(14,2) NULL, -- total_amount - total_cost
profit_margin DECIMAL(5,2) NULL, -- (gross_profit / total_amount) * 100
-- 元数据
etl_batch_id VARCHAR(50) NULL,
load_time DATETIME DEFAULT GETDATE()
);
GO
CREATE NONCLUSTERED INDEX idx_fact_sales_date ON dbo.fact_sales(date_key);
CREATE NONCLUSTERED INDEX idx_fact_sales_customer ON dbo.fact_sales(customer_key)
WHERE customer_key IS NOT NULL;
CREATE NONCLUSTERED INDEX idx_fact_sales_product ON dbo.fact_sales(product_key)
WHERE product_key IS NOT NULL;
CREATE NONCLUSTERED INDEX idx_fact_sales_order ON dbo.fact_sales(order_id);
GO
Star Schema ETL 流程
客户维度加载(SCD Type 2)
sql
USE star_db;
GO
IF OBJECT_ID('dbo.sp_load_dim_customer', 'P') IS NOT NULL
DROP PROCEDURE dbo.sp_load_dim_customer;
GO
CREATE PROCEDURE dbo.sp_load_dim_customer
@batch_id VARCHAR(50)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @start_time DATETIME = GETDATE();
DECLARE @rows_inserted BIGINT = 0;
DECLARE @rows_updated BIGINT = 0;
DECLARE @error_msg NVARCHAR(MAX);
BEGIN TRY
INSERT INTO etl_db.dbo.etl_log (
batch_id, layer_name, db_name, table_name, start_time, status
) VALUES (
@batch_id, 'star', 'star_db', 'dim_customer', @start_time, 'RUNNING'
);
-- Step 1: 关闭已变化记录的旧版本
-- 从 PSA 取最新客户数据,与当前维度比较
UPDATE d
SET d.end_date = GETDATE(), d.is_current = 0
FROM dbo.dim_customer d
INNER JOIN (
SELECT customer_id, customer_name, city, region, address,
customer_type, is_active, email, phone,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY psa_load_time DESC) AS rn
FROM psa_db.dbo.customers
) p ON d.customer_id = p.customer_id
WHERE d.is_current = 1
AND d.end_date IS NULL
AND p.rn = 1
AND (
d.customer_name <> p.customer_name
OR ISNULL(d.city, '') <> ISNULL(p.city, '')
OR ISNULL(d.region, '') <> ISNULL(p.region, '')
OR ISNULL(d.address, '') <> ISNULL(p.address, '')
);
SET @rows_updated = @@ROWCOUNT;
-- Step 2: 插入新版本或全新记录
INSERT INTO dbo.dim_customer (
customer_id, load_date, end_date, is_current,
customer_name, customer_type, email, phone,
address, city, region, country,
register_date, is_active
)
SELECT
p.customer_id, p.psa_load_time, NULL, 1,
p.customer_name, p.customer_type, p.email, p.phone,
p.address, p.city, p.region, N'中国',
p.register_date, p.is_active
FROM (
SELECT customer_id, customer_name, customer_type, email, phone,
address, city, region, register_date, is_active, psa_load_time,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY psa_load_time DESC) AS rn
FROM psa_db.dbo.customers
) p
WHERE p.rn = 1
AND NOT EXISTS (
SELECT 1 FROM dbo.dim_customer d
WHERE d.customer_id = p.customer_id
AND d.is_current = 1
AND d.customer_name = p.customer_name
AND ISNULL(d.city, '') = ISNULL(p.city, '')
AND ISNULL(d.region, '') = ISNULL(p.region, '')
AND ISNULL(d.address, '') = ISNULL(p.address, '')
);
SET @rows_inserted = @@ROWCOUNT;
UPDATE etl_db.dbo.etl_log
SET end_time = GETDATE(), rows_inserted = @rows_inserted,
rows_updated = @rows_updated, status = 'SUCCESS'
WHERE batch_id = @batch_id AND db_name = 'star_db' AND table_name = 'dim_customer';
PRINT N'dim_customer: 新增 ' + CAST(@rows_inserted AS VARCHAR)
+ N' 条, 关闭旧版本 ' + CAST(@rows_updated AS VARCHAR) + N' 条';
END TRY
BEGIN CATCH
SET @error_msg = ERROR_MESSAGE();
UPDATE etl_db.dbo.etl_log
SET end_time = GETDATE(), status = 'FAILED', error_message = @error_msg
WHERE batch_id = @batch_id AND db_name = 'star_db' AND table_name = 'dim_customer';
THROW;
END CATCH;
END;
GO
商品维度加载(SCD Type 2)
sql
USE star_db;
GO
IF OBJECT_ID('dbo.sp_load_dim_product', 'P') IS NOT NULL
DROP PROCEDURE dbo.sp_load_dim_product;
GO
CREATE PROCEDURE dbo.sp_load_dim_product
@batch_id VARCHAR(50)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @start_time DATETIME = GETDATE();
DECLARE @rows_inserted BIGINT = 0;
DECLARE @rows_updated BIGINT = 0;
DECLARE @error_msg NVARCHAR(MAX);
BEGIN TRY
INSERT INTO etl_db.dbo.etl_log (
batch_id, layer_name, db_name, table_name, start_time, status
) VALUES (
@batch_id, 'star', 'star_db', 'dim_product', @start_time, 'RUNNING'
);
-- Step 1: 关闭旧版本
UPDATE d
SET d.end_date = GETDATE(), d.is_current = 0
FROM dbo.dim_product d
INNER JOIN (
SELECT product_id, product_name, category, sub_category, brand,
unit_cost, unit_price, supplier_id, is_active,
ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY psa_load_time DESC) AS rn
FROM psa_db.dbo.products
) p ON d.product_id = p.product_id
WHERE d.is_current = 1
AND d.end_date IS NULL
AND p.rn = 1
AND (
d.product_name <> p.product_name
OR ISNULL(d.unit_price, 0) <> ISNULL(p.unit_price, 0)
OR ISNULL(d.unit_cost, 0) <> ISNULL(p.unit_cost, 0)
);
SET @rows_updated = @@ROWCOUNT;
-- Step 2: 插入新版本
INSERT INTO dbo.dim_product (
product_id, load_date, end_date, is_current,
product_name, category, sub_category, brand,
supplier_id, unit_cost, unit_price, is_active
)
SELECT
p.product_id, p.psa_load_time, NULL, 1,
p.product_name, p.category, p.sub_category, p.brand,
p.supplier_id, p.unit_cost, p.unit_price, p.is_active
FROM (
SELECT product_id, product_name, category, sub_category, brand,
unit_cost, unit_price, supplier_id, is_active, psa_load_time,
ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY psa_load_time DESC) AS rn
FROM psa_db.dbo.products
) p
WHERE p.rn = 1
AND NOT EXISTS (
SELECT 1 FROM dbo.dim_product d
WHERE d.product_id = p.product_id
AND d.is_current = 1
AND d.product_name = p.product_name
AND ISNULL(d.unit_price, 0) = ISNULL(p.unit_price, 0)
AND ISNULL(d.unit_cost, 0) = ISNULL(p.unit_cost, 0)
);
SET @rows_inserted = @@ROWCOUNT;
UPDATE etl_db.dbo.etl_log
SET end_time = GETDATE(), rows_inserted = @rows_inserted,
rows_updated = @rows_updated, status = 'SUCCESS'
WHERE batch_id = @batch_id AND db_name = 'star_db' AND table_name = 'dim_product';
PRINT N'dim_product: 新增 ' + CAST(@rows_inserted AS VARCHAR)
+ N' 条, 关闭旧版本 ' + CAST(@rows_updated AS VARCHAR) + N' 条';
END TRY
BEGIN CATCH
SET @error_msg = ERROR_MESSAGE();
UPDATE etl_db.dbo.etl_log
SET end_time = GETDATE(), status = 'FAILED', error_message = @error_msg
WHERE batch_id = @batch_id AND db_name = 'star_db' AND table_name = 'dim_product';
THROW;
END CATCH;
END;
GO
销售事实表加载
sql
USE star_db;
GO
IF OBJECT_ID('dbo.sp_load_fact_sales', 'P') IS NOT NULL
DROP PROCEDURE dbo.sp_load_fact_sales;
GO
CREATE PROCEDURE dbo.sp_load_fact_sales
@batch_id VARCHAR(50)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @start_time DATETIME = GETDATE();
DECLARE @rows_inserted BIGINT = 0;
DECLARE @error_msg NVARCHAR(MAX);
BEGIN TRY
INSERT INTO etl_db.dbo.etl_log (
batch_id, layer_name, db_name, table_name, start_time, status
) VALUES (
@batch_id, 'star', 'star_db', 'fact_sales', @start_time, 'RUNNING'
);
-- 增量加载:仅插入新增的订单(不在事实表中的)
INSERT INTO dbo.fact_sales (
date_key, customer_key, product_key, status_key,
order_id,
quantity, unit_price, total_amount, discount_amount,
unit_cost, total_cost,
gross_profit, profit_margin,
etl_batch_id, load_time
)
SELECT
-- 日期键
YEAR(o.order_date) * 10000 + MONTH(o.order_date) * 100 + DAY(o.order_date),
-- 客户代理键(取当前有效)
(SELECT MAX(c.customer_key) FROM dbo.dim_customer c
WHERE c.customer_id = o.customer_id AND c.is_current = 1),
-- 商品代理键(取当前有效)
(SELECT MAX(p.product_key) FROM dbo.dim_product p
WHERE p.product_id = o.product_id AND p.is_current = 1),
-- 状态键
(SELECT s.status_key FROM dbo.dim_order_status s
WHERE s.status_code = o.status),
-- 订单号
o.order_id,
-- 度量值
o.quantity,
o.unit_price,
o.total_amount,
0,
-- 成本(从商品维度获取)
(SELECT TOP 1 dp.unit_cost FROM dbo.dim_product dp
WHERE dp.product_id = o.product_id AND dp.is_current = 1),
(SELECT TOP 1 ISNULL(dp.unit_cost, 0) FROM dbo.dim_product dp
WHERE dp.product_id = o.product_id AND dp.is_current = 1) * o.quantity,
-- 衍生指标
o.total_amount - (
(SELECT TOP 1 ISNULL(dp.unit_cost, 0) FROM dbo.dim_product dp
WHERE dp.product_id = o.product_id AND dp.is_current = 1) * o.quantity
),
CASE
WHEN o.total_amount > 0 THEN
CAST(ROUND(((
o.total_amount - (
(SELECT TOP 1 ISNULL(dp.unit_cost, 0) FROM dbo.dim_product dp
WHERE dp.product_id = o.product_id AND dp.is_current = 1) * o.quantity
)
) / o.total_amount * 100), 2) AS DECIMAL(5,2))
ELSE 0
END,
-- 元数据
@batch_id,
GETDATE()
FROM (
-- 取 PSA 中每个订单的最新版本
SELECT order_id, customer_id, product_id, order_date,
quantity, unit_price, total_amount, status, psa_load_time,
ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY psa_load_time DESC) AS rn
FROM psa_db.dbo.orders
) o
WHERE o.rn = 1
AND NOT EXISTS (
SELECT 1 FROM dbo.fact_sales f WHERE f.order_id = o.order_id
);
SET @rows_inserted = @@ROWCOUNT;
-- 更新已有订单的状态(如果状态发生了变化)
UPDATE f
SET f.status_key = (SELECT s.status_key FROM dbo.dim_order_status s WHERE s.status_code = o.status),
f.etl_batch_id = @batch_id,
f.load_time = GETDATE()
FROM dbo.fact_sales f
INNER JOIN (
SELECT order_id, status,
ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY psa_load_time DESC) AS rn
FROM psa_db.dbo.orders
) o ON f.order_id = o.order_id
WHERE o.rn = 1
AND f.status_key <> (SELECT s.status_key FROM dbo.dim_order_status s WHERE s.status_code = o.status);
UPDATE etl_db.dbo.etl_log
SET end_time = GETDATE(), rows_inserted = @rows_inserted, status = 'SUCCESS'
WHERE batch_id = @batch_id AND db_name = 'star_db' AND table_name = 'fact_sales';
PRINT N'fact_sales 加载完成: ' + CAST(@rows_inserted AS VARCHAR) + N' 行';
END TRY
BEGIN CATCH
SET @error_msg = ERROR_MESSAGE();
UPDATE etl_db.dbo.etl_log
SET end_time = GETDATE(), status = 'FAILED', error_message = @error_msg
WHERE batch_id = @batch_id AND db_name = 'star_db' AND table_name = 'fact_sales';
THROW;
END CATCH;
END;
GO
Star Schema 主 ETL 调度
sql
USE star_db;
GO
IF OBJECT_ID('dbo.sp_run_star_etl', 'P') IS NOT NULL
DROP PROCEDURE dbo.sp_run_star_etl;
GO
CREATE PROCEDURE dbo.sp_run_star_etl
@batch_id VARCHAR(50)
AS
BEGIN
SET NOCOUNT ON;
PRINT N'=== Star Schema ETL 开始 ===';
PRINT N'批次号: ' + @batch_id;
-- 先加载维度(事实表依赖维度代理键)
EXEC dbo.sp_load_dim_customer @batch_id;
EXEC dbo.sp_load_dim_product @batch_id;
-- 再加载事实表
EXEC dbo.sp_load_fact_sales @batch_id;
PRINT N'=== Star Schema ETL 完成 ===';
END;
GO
执行 ETL 并验证
sql
-- ============================================================
-- 执行 Star Schema ETL
-- ============================================================
DECLARE @batch_id VARCHAR(50);
SET @batch_id = 'BATCH_STAR_' + CONVERT(VARCHAR(8), GETDATE(), 112) + '_'
+ REPLACE(CONVERT(VARCHAR(8), GETDATE(), 108), ':', '');
EXEC star_db.dbo.sp_run_star_etl @batch_id;
GO
-- ============================================================
-- 验证结果
-- ============================================================
USE star_db;
GO
-- 查看 ETL 日志
SELECT batch_id, table_name, start_time, end_time,
DATEDIFF(second, start_time, end_time) AS duration_sec,
rows_inserted, rows_updated, status
FROM etl_db.dbo.etl_log
WHERE layer_name = 'star'
ORDER BY start_time;
GO
-- 查看各表记录数
SELECT 'dim_date' AS tbl, COUNT(*) AS cnt FROM dbo.dim_date
UNION ALL SELECT 'dim_customer', COUNT(*) FROM dbo.dim_customer
UNION ALL SELECT 'dim_product', COUNT(*) FROM dbo.dim_product
UNION ALL SELECT 'dim_order_status', COUNT(*) FROM dbo.dim_order_status
UNION ALL SELECT 'fact_sales', COUNT(*) FROM dbo.fact_sales;
GO
-- 查看事实表样本数据
SELECT TOP 5
f.order_id,
d.date_value AS order_date,
c.customer_name,
p.product_name,
f.quantity,
f.total_amount,
f.gross_profit,
f.profit_margin
FROM dbo.fact_sales f
LEFT JOIN dbo.dim_date d ON f.date_key = d.date_key
LEFT JOIN dbo.dim_customer c ON f.customer_key = c.customer_key
LEFT JOIN dbo.dim_product p ON f.product_key = p.product_key
ORDER BY f.fact_sales_id;
GO
Star Schema 分析查询
sql
-- ============================================================
-- Q1:每月销售额趋势
-- ============================================================
USE star_db;
GO
SELECT
d.year,
d.month,
d.month_name_short,
SUM(f.total_amount) AS total_sales,
SUM(f.quantity) AS total_quantity,
COUNT(DISTINCT f.order_id) AS order_count,
ROUND(AVG(f.profit_margin), 2) AS avg_margin
FROM dbo.fact_sales f
INNER JOIN dbo.dim_date d ON f.date_key = d.date_key
GROUP BY d.year, d.month, d.month_name_short
ORDER BY d.year, d.month;
GO
-- ============================================================
-- Q2:按客户类型的销售分析
-- ============================================================
SELECT
c.customer_type,
COUNT(DISTINCT f.order_id) AS order_count,
SUM(f.total_amount) AS total_sales,
ROUND(AVG(f.profit_margin), 2) AS avg_margin
FROM dbo.fact_sales f
INNER JOIN dbo.dim_customer c ON f.customer_key = c.customer_key
WHERE c.is_current = 1
GROUP BY c.customer_type
ORDER BY total_sales DESC;
GO
-- ============================================================
-- Q3:按商品品类的销售排行
-- ============================================================
SELECT
p.category,
p.product_name,
COUNT(DISTINCT f.order_id) AS order_count,
SUM(f.quantity) AS total_quantity,
SUM(f.total_amount) AS total_sales,
SUM(f.gross_profit) AS total_profit
FROM dbo.fact_sales f
INNER JOIN dbo.dim_product p ON f.product_key = p.product_key
WHERE p.is_current = 1
GROUP BY p.category, p.product_name
ORDER BY total_sales DESC;
GO
-- ============================================================
-- Q4:按区域的销售分析
-- ============================================================
SELECT
c.region,
COUNT(DISTINCT f.order_id) AS order_count,
SUM(f.total_amount) AS total_sales
FROM dbo.fact_sales f
INNER JOIN dbo.dim_customer c ON f.customer_key = c.customer_key
WHERE c.is_current = 1
GROUP BY c.region
ORDER BY total_sales DESC;
GO
-- ============================================================
-- Q5:按订单状态的订单分布
-- ============================================================
SELECT
s.status_name,
COUNT(*) AS order_count,
SUM(f.total_amount) AS total_sales
FROM dbo.fact_sales f
INNER JOIN dbo.dim_order_status s ON f.status_key = s.status_key
GROUP BY s.status_name
ORDER BY order_count DESC;
GO
Star Schema 与 Data Vault 的对比
| 维度 | Star Schema | Data Vault |
|---|---|---|
| 设计理念 | 面向分析,性能优先 | 面向业务,敏捷可追溯 |
| 结构 | 事实表 + 维度表 | Hub + Link + Satellite |
| 主键 | 代理键(整数自增) | 业务主键或哈希键 |
| 历史追踪 | 可选(通过 SCD) | 原生支持(每条记录带时间) |
| 查询性能 | 极快(少表关联) | 较慢(多表关联) |
| 业务变更适应性 | 较差(需重构) | 极强(只增不改) |
| 适用场景 | 成熟稳定的业务分析 | 快速变化的业务环境 |
| 使用时机 | 数据仓库成熟期 | 数据仓库建设期/中期 |
经典 Star Schema 模式
┌─────────────────────────────────────────────────────────────────┐
│ 销售分析 Star Schema │
│ (star_db 数据库) │
│ │
│ ┌──────────────────────┐ │
│ │ FACT_SALES │ │
│ │ date_key (FK) │ │
│ │ customer_key (FK) │ │
│ │ product_key (FK) │ │
│ │ status_key (FK) │ │
│ │ order_id (DD) │ │
│ │ quantity │ │
│ │ total_amount │ │
│ │ gross_profit │ │
│ └──────────┬───────────┘ │
│ │ │
│ ┌────────┼──────────┐ │
│ ↓ ↓ ↓ │
│ ┌────────┐ ┌─────────┐ ┌──────────┐ ┌────────────────┐ │
│ │DIM_DATE│ │DIM_CUST │ │DIM_PROD │ │DIM_ORDER_STATUS│ │
│ │date_key│ │cust_key │ │prod_key │ │status_key │ │
│ │year │ │name │ │name │ │status_code │ │
│ │quarter │ │city │ │category │ │status_name │ │
│ │month │ │region │ │brand │ └────────────────┘ │
│ │week │ │type │ │price │ │
│ └────────┘ └─────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
小结
| 概念 | 说明 |
|---|---|
| 事实表 | 存储业务度量,中心位置,多表关联 |
| 维度表 | 描述业务的分析角度(谁/什么/何时/何地) |
| 退化维度(DD) | 简单、低基数的属性,直接放入事实表(如订单号) |
| 代理键 | 整数型主键,替代业务主键,提升性能 |
| SCD Type 2 | 缓慢变化维度,通过 load_date/end_date 追踪历史 |
Star Schema 设计原则:
- 📌 一个事实表 + 多个维度表
- 📌 维度表字段尽量"宽"(丰富的描述属性)
- 📌 尽量避免多对多关系
- 📌 退化维度简化查询