一、Linux
定时任务配置 crontab 编辑
查看当前用户定时任务列表
清空系统日志文件 sys.log 不删除文件
bash
crontab -e
crontab -l
> sys.log
crontab -e :编辑定时任务,大数据离线调度、脚本定时执行必备
crontab -l :查看已配置定时任务,排查调度重复 / 配置错误
> sys.log:重定向清空文件,只清内容不删文件,避免进程找不到日志文件
二、SQL
1148. 文章浏览 I

sql
SELECT DISTINCT author_id AS id
FROM Views
WHERE author_id = viewer_id
ORDER BY author_id;
1173. 即时食物配送 I

sql
SELECT
ROUND(
SUM(IF(order_date = customer_delivery_date, 1, 0)) / COUNT(*) * 100,
2
) AS immediate_percentage
FROM Delivery;
IF 做条件计数
SUM+IF 统计满足条件条数
总数相除 ×100 求占比,ROUND 保留 2 位
业务转化率、渗透率标准写法
1280. 学生们参加各科测试的次数

sql
SELECT
s.student_id,
s.student_name,
sub.subject_name,
COUNT(e.subject_name) AS attended_exams
FROM Students s
CROSS JOIN Subjects sub
LEFT JOIN Examinations e
ON s.student_id = e.student_id
AND sub.subject_name = e.subject_name
GROUP BY s.student_id, s.student_name, sub.subject_name
ORDER BY s.student_id, sub.subject_name;
CROSS JOIN 笛卡尔积 生成 学生 × 科目 全组合
LEFT JOIN 左连接,无考试记录自动为 null
COUNT(关联表字段) 自动把 null 计为 0数仓维度补齐、缺失值补 0 经典模板
三、Pyspark
条件筛选 + 去重排序
百分比计算 + 保留小数
全维度笛卡尔积 + 左连接补 0
python
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, round, count, when
spark = SparkSession.builder.master("local[*]").appName("Day25").getOrCreate()
# 1. 浏览过自己文章的作者
views = spark.createDataFrame([
(1,1,1,"2025-05-04"),
(2,2,3,"2025-05-04")
], ["article_id","author_id","viewer_id","view_date"])
views.filter(col("author_id") == col("viewer_id"))\
.select("author_id").distinct().orderBy("author_id").show()
# 2. 当日配送占比
delivery = spark.createDataFrame([
(1,1,"2025-05-01","2025-05-01"),
(2,2,"2025-05-01","2025-05-02")
])\
.toDF("delivery_id","customer_id","order_date","customer_delivery_date")
res = delivery.agg(
round(
count(when(col("order_date")==col("customer_delivery_date"),1))
/ count("*") * 100, 2
).alias("immediate_percentage")
)
res.show()
spark.stop()
Spark 过滤、去重、排序和 SQL 逻辑一致
when 等价 SQL IF 条件判断
笛卡尔积 + 左连接 可实现维度全量补齐,和业务报表对齐
四、算法
283. 移动零
给定数组,将所有 0 移到数组末尾,保持非零元素相对顺序,不额外拷贝数组
python
def moveZeroes(nums):
j = 0
for i in range(len(nums)):
if nums[i] != 0:
nums[j], nums[i] = nums[i], nums[j]
j += 1
双指针原地交换,不开新数组
时间 O (n),空间 O (1)